STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null.#10211
STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null.#10211xadupre merged 24 commits intomicrosoft:masterfrom
Conversation
| @@ -0,0 +1,35 @@ | |||
| if (onnxruntime_USE_STVM) | |||
There was a problem hiding this comment.
You still need to add nuphar and all its dependencies that will be used in your build to cgmanifest.json.
There was a problem hiding this comment.
TVM dependencies are listed here: https://github.com/apache/tvm/blob/main/LICENSE#L204. Some of them are checked in the repository and the sources are difficult to trace (cma). Another comes from another dependency already listed in this file (compiler-rt / llvm, blockingconcurrentqueue.h). I added "comments": "dependency from tvm" for the newly added dependencies.
docs/STVM_EP.md
Outdated
|
|
||
| ``` | ||
| cd onnxruntime/cmake/external/tvm_update/ | ||
| cd onnxruntime/cmake/external/tvm/ |
There was a problem hiding this comment.
Why users need to manually build TVM first? If that, it seems we don't need to clone TVM. We can treat TVM as CUDA SDK. We ask the user provide the install path of the software.
In whatever case, we still need to track the software with cgmanifest.json.
|
Hello @xadupre, I think it is good idea leave one tvm submodule instead of two but you should know the following things. 1. Nuphar and STVM are the different EPs although they both are based on Apache TVM. They are developed by different teams and use different TVM hash. For STVM I do not think that v0.8.0 is a good choice, we use very fresh commit. Separately it should be agreed with Nuphar team. |
|
Both NUPHAR and STVM are using TVM but not the same versions (nuphar is using a fork not updated in two years). I used whatever was already checked in. Between 0.5 and 0.8, TVM updated its C++ API, updating TVM for nuphar would require many changes (see #10183). This PR does not remove TVM, it fetches TVM sources only if it is needed. The build does not change for nuphar or any other version. The goal of this PR is let the build unchanged but clone tvm only when it is needed. |
|
Makes sense to me as well, thanks for the improvement @xadupre |
|
Hello @xadupre I recommended you to check build of TVM inside ORT without USE_MICRO key. The latter activates tvmc which allows to work with TVM by commands from shell I do not think that it needs for ORT. We also will check it tommorow on our side |
|
I tried but it was crashing when running "import onnxruntime.providers.stvm". So I wanted to try with the same setting given in the documentation. To be more precise, I get the following. |
|
Hello @xadupre, we are concerned about possible situation when this PR will be merged due to CI passed, but in fact it will not work for ORT build with TVM or separated TVM build or TVM whl-package build and so on. May be we can help you or can communicate/test to be sure that all things work well before it will be merged? |
|
Hello @vvchernov, the only way to make sure of it is to add a CI job with a unit test. It is working for me with the following steps: Then a simple script: Now I need to check if it works too after installing both *.whl built during this build. |
|
Hello @xadupre, |
|
Hi @KJlaccHoeUM9l, this PR was meged and I built with this one included. I did not work for me. I'll try again. |
|
Sorry, it was my mistake in PR#10241. if package_name == 'onnxruntime-stvm':Maybe it will be faster to make this change within this PR? Or should I open a new PR? |
|
@KJlaccHoeUM9l, I fixed it in this PR. I added a small example to check the provider is working. Is it possible to review it? It fails foe me: https://github.com/microsoft/onnxruntime/pull/10211/files#diff-c6a93b897de7a145372d94463bb7aa6030f3bd3578c50548a261025b6825ade6R53. The inference works but CPUExecutionProvider and StvmExecutionProvider do not give the same output. |
@xadupre which model did you use for test? |
The test has to be fast so I did not use any predefined model. The model is created in the script using onnx functions. It is a linear regression (XA + B). |
|
@xadupre, this error is due to the fact that your model does not have fixed tensor shapes. At the moment, STVM EP in automatic mode can only work with models that have fixed shapes. X = make_tensor_value_info('X', TensorProto.FLOAT, [1, 2])
A = make_tensor_value_info('A', TensorProto.FLOAT, [2, 2])
B = make_tensor_value_info('B', TensorProto.FLOAT, [1, 2])
Y = make_tensor_value_info('Y', TensorProto.FLOAT, [1, 2]) |
|
I suggest to raise an exception if the model has dynamic shape and input_shapes is not specified. |
| set_target_properties(tvm_runtime PROPERTIES FOLDER ${tvm_SOURCE_DIR}) | ||
|
|
||
| set(TVM_INCLUDES ${tvm_SOURCE_DIR}/include | ||
| ${tvm_SOURCE_DIR}/3rdparty/dmlc-core/include |
There was a problem hiding this comment.
So, please also add dmlc-core and dlpack to our cgmanifest.json. In the json file you may add a comment saying they are for TVM.
There was a problem hiding this comment.
I don't know if it has other dependencies. If you know any, please also add.
snnn
left a comment
There was a problem hiding this comment.
LGTM. You may merge it and continue to improve the code.
|
Great, thank you @xadupre for getting this PR through and @snnn @KJlaccHoeUM9l and @vvchernov for the review. |
| message(STATUS "TVM BEFORE USE_LLVM=${USE_LLVM} USE_OPENMP=${USE_OPENMP} USE_MICRO=${USE_MICRO} CMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} USE_CUDA=${USE_CUDA} USE_GTEST=${USE_GTEST} CMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}") | ||
| message(STATUS "tvm_SOURCE_DIR=${tvm_SOURCE_DIR}") | ||
| message(STATUS "tvm_BINARY_DIR=${tvm_BINARY_DIR}") | ||
| add_subdirectory(${tvm_SOURCE_DIR} ${tvm_BINARY_DIR} EXCLUDE_FROM_ALL) |
There was a problem hiding this comment.
@xadupre Do we require EXCLUDE_FROM_ALL here, as this prevents "tvm" to be installed when "onnxruntime" is installed?
…t null. (microsoft#10211) * STVM, checks pointers are not null. * removes submodules tvm * add missing include(FetchContent) * add target tvm * fix stvm test * extend cgmanifest with dependencies of tvm (cherry picked from commit 481b96d)
Description:
Motivation and Context
Help identy why it fails. Speed up CI.