STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. by xadupre · Pull Request #10211 · microsoft/onnxruntime

xadupre · 2022-01-06T19:04:12Z

Description:

Checks pointers are not null to avoid crashes.
Removes tvm from the list of submodules and use FetchContent to retrieve the sources. It should speed up the build for every other scenarios.

Motivation and Context
Help identy why it fails. Speed up CI.

…o stvm1

snnn · 2022-01-12T18:30:07Z

cmake/external/tvm.cmake

@@ -0,0 +1,35 @@
+if (onnxruntime_USE_STVM)


You still need to add nuphar and all its dependencies that will be used in your build to cgmanifest.json.

TVM dependencies are listed here: https://github.com/apache/tvm/blob/main/LICENSE#L204. Some of them are checked in the repository and the sources are difficult to trace (cma). Another comes from another dependency already listed in this file (compiler-rt / llvm, blockingconcurrentqueue.h). I added "comments": "dependency from tvm" for the newly added dependencies.

snnn · 2022-01-12T18:34:06Z

docs/STVM_EP.md


 ```
-cd onnxruntime/cmake/external/tvm_update/
+cd onnxruntime/cmake/external/tvm/


Why users need to manually build TVM first? If that, it seems we don't need to clone TVM. We can treat TVM as CUDA SDK. We ask the user provide the install path of the software.

In whatever case, we still need to track the software with cgmanifest.json.

It comes from PR #10019. The first draft requires to manually build TVM. There is also an ongoing work with PR #10260 (renaming).

vvchernov · 2022-01-13T18:26:39Z

Hello @xadupre, I think it is good idea leave one tvm submodule instead of two but you should know the following things. 1. Nuphar and STVM are the different EPs although they both are based on Apache TVM. They are developed by different teams and use different TVM hash. For STVM I do not think that v0.8.0 is a good choice, we use very fresh commit. Separately it should be agreed with Nuphar team.
2. If you remove tvm submodules it is needed to have TVM elsewhere and build it before ORT. Particularly so-file from TVM is used for whl-package build (onnxruntime-stvm). Moreover build system of ORT is big enough it looks like you need check(update) more places for correct work without tvm submodules.
3. TVM build outside of ORT becames the responsibility of client. But it should be noted that TVM is target-dependent project. With which keys do you plan to build it and use for ORT? There is no universal answer. Our idea was to add cmake keys to build TVM together with ORT and based on keys from ORT, for GPU we use CUDA or Metal or Vulkan, for CPU - LLVM and so on.
4. Please also take into account PR#10260.

xadupre · 2022-01-14T10:36:30Z

Both NUPHAR and STVM are using TVM but not the same versions (nuphar is using a fork not updated in two years). I used whatever was already checked in. Between 0.5 and 0.8, TVM updated its C++ API, updating TVM for nuphar would require many changes (see #10183). This PR does not remove TVM, it fetches TVM sources only if it is needed. The build does not change for nuphar or any other version. The goal of this PR is let the build unchanged but clone tvm only when it is needed.

…o stvm1

…tvm1

tmoreau89 · 2022-01-20T18:29:49Z

Thank you @xadupre - I think we can go ahead and merge this PR if it's ready, after which we can go ahead and land the following PR: #10260

jroesch · 2022-01-21T18:11:50Z

Makes sense to me as well, thanks for the improvement @xadupre

vvchernov · 2022-01-24T17:34:35Z

Hello @xadupre I recommended you to check build of TVM inside ORT without USE_MICRO key. The latter activates tvmc which allows to work with TVM by commands from shell I do not think that it needs for ORT. We also will check it tommorow on our side

xadupre · 2022-01-24T17:49:02Z

I tried but it was crashing when running "import onnxruntime.providers.stvm". So I wanted to try with the same setting given in the documentation. To be more precise, I get the following.

terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [11:12:32] onnxruntime/build/cpu_stvm/Release/_deps/tvm-src/src/runtime/registry.cc:69: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (can_override) is false: Global PackedFunc arith.CreateAnalyzer is already registered

Aborted (core dumped)

…o stvm1

vvchernov · 2022-01-26T11:58:15Z

Hello @xadupre, we are concerned about possible situation when this PR will be merged due to CI passed, but in fact it will not work for ORT build with TVM or separated TVM build or TVM whl-package build and so on. May be we can help you or can communicate/test to be sure that all things work well before it will be merged?

xadupre · 2022-01-26T12:13:43Z

Hello @vvchernov, the only way to make sure of it is to add a CI job with a unit test. It is working for me with the following steps:

python3 ./tools/ci_build/build.py --config Release --skip_tests --build_wheel --parallel  --build_dir ./build/cpu_stvm --use_stvm
export PYTHONPATH=~/github/onnxruntime/build/cpu_stvm/Release/:~/github/onnxruntime/build/cpu_stvm/Release/_deps/tvm-src/python

Then a simple script:

import tvm  # needs to be imported first
import onnxruntime
# ... the rest of the script

Now I need to check if it works too after installing both *.whl built during this build.

KJlaccHoeUM9l · 2022-01-26T12:32:31Z

Hello @xadupre, import tvm will be automatically done after PR#10241 was merged.
When building the .whl package for onnxruntime-stvm, this import will be automatically inserted into _ld_preload.py. Thus, import tvm will be called at the beginning of the entire chain of imports in onnxruntime.
Therefore, import tvm can be removed from your test script.

xadupre · 2022-01-26T13:06:48Z

Hi @KJlaccHoeUM9l, this PR was meged and I built with this one included. I did not work for me. I'll try again.

KJlaccHoeUM9l · 2022-01-26T14:52:07Z

Sorry, it was my mistake in PR#10241.
Due to active work with a branch in which STVM has already been renamed to TVM, I made a mistake on this line.
For correct work, you need to change this line as follows:

if package_name == 'onnxruntime-stvm':

Maybe it will be faster to make this change within this PR? Or should I open a new PR?

xadupre · 2022-01-26T15:01:11Z

@KJlaccHoeUM9l, I fixed it in this PR. I added a small example to check the provider is working. Is it possible to review it? It fails foe me: https://github.com/microsoft/onnxruntime/pull/10211/files#diff-c6a93b897de7a145372d94463bb7aa6030f3bd3578c50548a261025b6825ade6R53. The inference works but CPUExecutionProvider and StvmExecutionProvider do not give the same output.

vvchernov · 2022-01-26T15:07:38Z

@KJlaccHoeUM9l, I fixed it in this PR. I added a small example to check the provider is working. Is it possible to review it? It fails foe me: https://github.com/microsoft/onnxruntime/pull/10211/files#diff-c6a93b897de7a145372d94463bb7aa6030f3bd3578c50548a261025b6825ade6R53. The inference works but CPUExecutionProvider and StvmExecutionProvider do not give the same output.

@xadupre which model did you use for test?

xadupre · 2022-01-26T15:10:27Z

@KJlaccHoeUM9l, I fixed it in this PR. I added a small example to check the provider is working. Is it possible to review it? It fails foe me: https://github.com/microsoft/onnxruntime/pull/10211/files#diff-c6a93b897de7a145372d94463bb7aa6030f3bd3578c50548a261025b6825ade6R53. The inference works but CPUExecutionProvider and StvmExecutionProvider do not give the same output.

@xadupre which model did you use for test?

The test has to be fast so I did not use any predefined model. The model is created in the script using onnx functions. It is a linear regression (XA + B).

KJlaccHoeUM9l · 2022-01-26T15:41:33Z

@xadupre, this error is due to the fact that your model does not have fixed tensor shapes. At the moment, STVM EP in automatic mode can only work with models that have fixed shapes.
In order for your test to work correctly, you need to change it as follows:

X = make_tensor_value_info('X', TensorProto.FLOAT, [1, 2])
A = make_tensor_value_info('A', TensorProto.FLOAT, [2, 2])
B = make_tensor_value_info('B', TensorProto.FLOAT, [1, 2])
Y = make_tensor_value_info('Y', TensorProto.FLOAT, [1, 2])

xadupre · 2022-01-26T15:55:16Z

I suggest to raise an exception if the model has dynamic shape and input_shapes is not specified.

snnn · 2022-01-26T18:29:06Z

cmake/CMakeLists.txt

+    set_target_properties(tvm_runtime PROPERTIES FOLDER ${tvm_SOURCE_DIR})
+
+    set(TVM_INCLUDES ${tvm_SOURCE_DIR}/include
+      ${tvm_SOURCE_DIR}/3rdparty/dmlc-core/include


So, please also add dmlc-core and dlpack to our cgmanifest.json. In the json file you may add a comment saying they are for TVM.

I don't know if it has other dependencies. If you know any, please also add.

snnn

LGTM. You may merge it and continue to improve the code.

prateek9623 · 2022-01-28T05:02:20Z

@xadupre hey can you help me with NUPHAR-TVM cmake install at this PR:#8919

tmoreau89 · 2022-01-28T05:03:38Z

Great, thank you @xadupre for getting this PR through and @snnn @KJlaccHoeUM9l and @vvchernov for the review.

prateek9623 · 2022-05-01T16:40:37Z

cmake/CMakeLists.txt

+  message(STATUS "TVM BEFORE USE_LLVM=${USE_LLVM} USE_OPENMP=${USE_OPENMP} USE_MICRO=${USE_MICRO} CMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} USE_CUDA=${USE_CUDA} USE_GTEST=${USE_GTEST} CMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}")
+  message(STATUS "tvm_SOURCE_DIR=${tvm_SOURCE_DIR}")
+  message(STATUS "tvm_BINARY_DIR=${tvm_BINARY_DIR}")
+  add_subdirectory(${tvm_SOURCE_DIR} ${tvm_BINARY_DIR} EXCLUDE_FROM_ALL)


@xadupre Do we require EXCLUDE_FROM_ALL here, as this prevents "tvm" to be installed when "onnxruntime" is installed?

…t null. (microsoft#10211) * STVM, checks pointers are not null. * removes submodules tvm * add missing include(FetchContent) * add target tvm * fix stvm test * extend cgmanifest with dependencies of tvm (cherry picked from commit 481b96d)

STVM, checks pointers are not null.

4b07aa4

xadupre mentioned this pull request Jan 6, 2022

Issue with ep:STVM, crash due to a null pointer #10212

Closed

sdpython added 7 commits January 7, 2022 11:04

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

70d0513

…o stvm1

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

deadf47

…o stvm1

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

9cbef33

…o stvm1

removes submodules tvm

dd0bf89

add missing include(FetchContent)

d16f9d3

improve cmakes

fe0d8b4

fix includes

5004190

xadupre changed the title ~~STVM, checks pointers are not null.~~ STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. Jan 11, 2022

sdpython added 2 commits January 11, 2022 17:54

fix misspelling

9c638aa

improves build

c344723

xadupre changed the title ~~STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null.~~ [WIP] STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. Jan 12, 2022

snnn reviewed Jan 12, 2022

View reviewed changes

xadupre and others added 3 commits January 14, 2022 17:13

improve build

81af7c4

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

e152283

…o stvm1

Merge branch 'stvm1' of https://github.com/xadupre/onnxruntime into s…

63700a1

…tvm1

KJlaccHoeUM9l mentioned this pull request Jan 18, 2022

[TVM EP] Improved usability of TVM EP #10241

Merged

snnn mentioned this pull request Jan 18, 2022

added support for cmake "find_package" #8919

Merged

add target tvm

b7cf753

vvchernov mentioned this pull request Jan 20, 2022

[TVM EP] support of TVM Virtual Machine #10341

Merged

tmoreau89 mentioned this pull request Jan 20, 2022

[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP #10260

Merged

fix compilation issues, still one error left

d4c1809

xadupre dismissed snnn’s stale review via 442587f January 26, 2022 10:19

xadupre and others added 2 commits January 26, 2022 11:29

finalize stvm build

0407d27

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

8a9493d

…o stvm1

documentation

cc1f52d

add unit test, finalize the build

f39e876

xadupre changed the title ~~[WIP] STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null.~~ STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. Jan 26, 2022

fix package name

59f5746

xadupre requested a review from snnn January 26, 2022 15:01

xadupre added 2 commits January 26, 2022 16:02

fix stvm test

fe0b96a

remove useless import

906744b

snnn reviewed Jan 26, 2022

View reviewed changes

extend cgmanifest with dependencies of tvm

1cf271e

snnn approved these changes Jan 27, 2022

View reviewed changes

xadupre merged commit 481b96d into microsoft:master Jan 27, 2022

vvchernov mentioned this pull request Feb 15, 2022

[DOC] update TVM EP docs #10562

Merged

prateek9623 reviewed May 1, 2022

View reviewed changes

xadupre deleted the stvm1 branch January 13, 2023 11:04

Comments

Conversation

xadupre commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

xadupre Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snnn Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

xadupre Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

vvchernov commented Jan 13, 2022

Uh oh!

xadupre commented Jan 14, 2022

Uh oh!

tmoreau89 commented Jan 20, 2022

Uh oh!

jroesch commented Jan 21, 2022

Uh oh!

vvchernov commented Jan 24, 2022

Uh oh!

xadupre commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvchernov commented Jan 26, 2022

Uh oh!

xadupre commented Jan 26, 2022

Uh oh!

KJlaccHoeUM9l commented Jan 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xadupre commented Jan 26, 2022

Uh oh!

KJlaccHoeUM9l commented Jan 26, 2022

Uh oh!

xadupre commented Jan 26, 2022

Uh oh!

vvchernov commented Jan 26, 2022

Uh oh!

xadupre commented Jan 26, 2022

Uh oh!

KJlaccHoeUM9l commented Jan 26, 2022

Uh oh!

xadupre commented Jan 26, 2022

Uh oh!

snnn Jan 26, 2022

Choose a reason for hiding this comment

Uh oh!

snnn Jan 26, 2022

Choose a reason for hiding this comment

Uh oh!

snnn left a comment

Choose a reason for hiding this comment

Uh oh!

prateek9623 commented Jan 28, 2022

Uh oh!

tmoreau89 commented Jan 28, 2022

Uh oh!

prateek9623 May 1, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

xadupre commented Jan 6, 2022 •

edited

Loading

xadupre Jan 27, 2022 •

edited

Loading

xadupre commented Jan 24, 2022 •

edited

Loading

KJlaccHoeUM9l commented Jan 26, 2022 •

edited

Loading