-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[MIGraphx EP] Sync AMD changes upstream #25338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIGraphx EP] Sync AMD changes upstream #25338
Conversation
Needed to maintain API compatability for ROCm 6.3.3 builds
* update * update * Fix build error CK_BUFFER_RESOURCE_3RD_DWORD Signed-off-by: Jagadish Krishnamoorthy <[email protected]> * Add all archs back + gfx950 Signed-off-by: Jagadish Krishnamoorthy <[email protected]> --------- Signed-off-by: Jagadish Krishnamoorthy <[email protected]> Co-authored-by: kithumma <[email protected]>
MIGraphX has enabled this support as part of the latest release. This allows Onnxruntime MIGraphX EP to parse in Attention and SkipLayernormalization operators and pass them to the MIGraphX API for further processing
* provider_options * use migraphx_provider_option contants * use migraphx_* constants for printing flags * add kDeviceId
* Adding support for mxr caching similar to MGX EP on Windows * Adding GenerateGraphId method * Fixing more issues * Fixing test * Adding version file * Adding charconv * Adding session input names * Add mxr filename prefix * Fixing issues with empty model cache path * Deleting previous approach of saving mxr files * use option.second directly for OrtMIGraphXProviderOptions * Revert "use option.second directly for OrtMIGraphXProviderOptions" This reverts commit 88f13e7. * Reordering headers and adding includes * Changing order include * Reordering * another fix * Adding gfx version hash to the mxr file name --------- Co-authored-by: Uros Petkovic <[email protected]> Co-authored-by: kjayapra <[email protected]> Co-authored-by: Artur Wojcik <[email protected]>
This fixes an issue we were seeing with calibration able using a spare reference for input. Caused us to try to use a stale reference which intermttently referenced a non existent or incorrect input causing calibration to fail or undefined behavior due to trying to use a reference address as a c_str() directly. Added a check using the .good() call for the ifstream when reading calibration tables as well to ensure we're not only getting a valid no null stream but no other errors are present on open
…ft#112) Co-authored-by: Uros Petkovic <[email protected]>
* Add null check for node arguments. * Add null check to Shape property of node arg.
…ft#117) * Add in BF16 datatype and quantization support to MIGraphX EP - Enable BF16 datatype in MIGraphX (migraphx_type_bf16_type) - Leverage MIGraphX API to do BF16 quantization (migraphx::quantize_bf16) - Add environment variable (MIGRAPHX_BF16_ENABLE) - Add provider session options variable (migraphx_bf16_enable) - Add python pybind API variable (migraphx_bf16_enable) * Lintrunner pass
Signed-off-by: Jagadish Krishnamoorthy <[email protected]>
The onnxruntime_add_shared_library_module() command places generated DLLs in the lib directory instead of the bin on Windows. The PR changes to use onnxruntime_add_shared_library(). I checked most of the essential EPs, and they all use onnxruntime_add_shared_library() as the method to create execution provider targets. The difference between the module and non-module versions of the command is that the module version does not install .lib files for targets it creates.
|
ping @tianleiwu changes that AMD wants to upstream that are going into ROCm 7.0 release. Likely more incoming in another pR |
|
@TedThemistokleous, please merge main since there is a recent change in lintrunner settting. Need |
…icrosoft#142) * Correct tidy warnings for MIGraphX Execution Provider * incorporate review feedback * lintrunner
| std::filesystem::path model_cache_file; | ||
| // empty cache path means the MXR caching is disabled - always compile | ||
| if (!model_cache_path_.empty()) { | ||
| model_cache_file = mgx_state->model_cache_dir / (mxr_filename_prefix + make_hash(input_shapes) + ".mxr"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part won't work good on Windows if the model_cache_dir contains non-ASCII chars.
First, in your code you need to put a comment to clarify the encoding of mgx_state->model_cache_dir. Is it ANSI encoding(that maps to a Windows codepage) or UTF-8?
Usually, most multibyte strings in ORT are UTF-8 strings. Then please convert the string to a wide string and do all the concatenation/modification in UTF-16.
On Linux all these strings should be UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all be UTF-8 I assume. We shouldn't be using non ascii characters here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that simple. You need to carefully examine the code. Add a comment to the definition of the variable.
…soft#148) * Correct tidy warnings for MIGraphX factory * add #pragma once * lintrunner * cpplint * add mising namespace
| static const char kExhaustiveTune[] = "ORT_MIGRAPHX_EXHAUSTIVE_TUNE"; | ||
|
|
||
| }; // namespace migraphx_env_vars | ||
| constexpr auto kFP16Enable = "ORT_MIGRAPHX_FP16_ENABLE"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to follow this guideline: https://abseil.io/tips/168 You may directly use std::string_view instead of absl::string_view since we have C++17.
Or, I think you need to declare them as inline static constexpr const char*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't that be covered by the auto here though? We're enforcing style checks that aren't in the liner?
Just want some clarity on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, auto is discouraged to use except in the cases where the type is very obvious to see. https://google.github.io/styleguide/cppguide.html#Type_deduction
"Use type deduction only if it makes the code clearer to readers who aren't familiar with the project, or if it makes the code safer. Do not use it merely to avoid the inconvenience of writing an explicit type."
I think there it is not that obvious. The keyword inline is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but again, why is this style comment not part of your linter profile checks then? This isn't a technical comment but style and if you want to enforce that, make it part of your lint runner checks/lint stage.
| save_compiled_path_ = info.save_model_file; | ||
| load_compiled_model_ = info.load_compiled_model; | ||
| load_compiled_path_ = info.load_model_file; | ||
| model_cache_path_ = info.model_cache_dir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are welcomed to use std::filesystem::path, every time when you assign a value to this type you need carefully check the source string's encoding.
On Windows we expect the source string is in native code page encoding. Otherwise if the string is encoded as UTF-8 you must manually convert it to std::wstring before assign it to a path.
Or you may change type of model_cache_dir to const ORTCHAR_T*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd, these came from windows side changes and we haven't seen this effecting our runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if you have a std::wstring or ORT_CHART* string, you have no problem. Otherwise try to explicitly state the encoding. Otherwise people will assume the string is a UTF-8 string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows You should never assign a UTF-8 string to a std::filesystem::path.
| if (load_compiled_model_ && !load_model_path_env.empty()) { | ||
| load_compiled_path_ = load_model_path_env; | ||
| LOGS_DEFAULT(WARNING) << "\nORT_MIGRAPHX_LOAD_COMPILED_PATH: " << load_compiled_path_; | ||
| const auto model_cache_path_env = GetEnvironmentVar(migraphx_env_vars::kModelCachePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the type explicit, and check the encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked, onnxruntime::GetEnvironmentVar returns an ANSI string, not a UTF-8 string. As you are define a new var, it's better to state it there this string is not a UTF-8 string(while most std::string variables are).
| } | ||
|
|
||
| static bool getMIGraphXType(ONNXTensorElementDataType type, | ||
| static bool getMIGraphXType(const ONNXTensorElementDataType type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the const
| namespace onnxruntime { | ||
|
|
||
| namespace migraphx_provider_option { | ||
| constexpr auto kDeviceId = "device_id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above. Consider using std::string_view instead.
| void* GetInfo() override { return &g_info; } | ||
|
|
||
| std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory(int device_id) override { | ||
| std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory(const int device_id) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the const.
| migx_options.migraphx_save_model_path = internal_options.save_model_file.c_str(); | ||
| migx_options.migraphx_load_compiled_model = internal_options.load_compiled_model; | ||
| migx_options.migraphx_load_model_path = internal_options.load_model_file.c_str(); | ||
| migx_options.migraphx_cache_dir = internal_options.model_cache_dir.string().c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.string().c_str() returns a temporary pointer I believe. You need to store the result of internal_options.model_cache_dir.string() at somewhere.
| &module) != 0) { | ||
| char buffer[MAX_PATH]; | ||
| if(GetModuleFileName(module, buffer, sizeof(buffer)) != 0) { | ||
| PathRemoveFileSpec(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to see if std::filesystem has a function for doing so. Then you will not need to use the low level Windows APIs.
| GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, | ||
| static_cast<LPCSTR>(static_cast<void*>(InitializeRegistry)), | ||
| &module) != 0) { | ||
| char buffer[MAX_PATH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use wide Windows APIs when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be split between windows/linux builds then around #ifdefs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when a Windows API has an ANSI version and Wide version, you need to use the Wide version.
The buffer here should be std::wstring.
| void Initialize() override { | ||
| #ifdef _WIN32 | ||
| HMODULE module = nullptr; | ||
| if(GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change usually is not needed. Usually it prefers the folder of the executable module that LoadLibraryEx is loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is required for AMD MIGraphX, not for the MIGraphX Execution Provider. The MIGraphX DLL loads the target DLLs (i.e., MIGraphX GPU target) at runtime. It does not use an absolute path and expects the target DLLs to be in the same directory as the MIGraphX DLL.
For example, we bundle MIGraphX binaries with MIGraphX EP Python WHL. Without this change, the MIGraphX does not know where to look for the GPU target. For Python, the executable working path is usually the location of a Python script or the Python executable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is something wrong with the LoadLibrary calls
For Python, the executable working path is usually the location of a Python script or the Python executable.
It's not about working dir. It's more about DLL search order. We should never search that directory.
88dbc51 to
071ca51
Compare
|
@snnn rolled back latest commit since that was the cause of a few issues. Let me know if you have any more review comments. |
|
Thanks. I don't have other comments. |
|
Sorry, one more thing: onnxruntime/tools/ci_build/github/linux/docker/migraphx-ci-pipeline-env.Dockerfile needs to be updated:
|
b265693 to
071ca51
Compare
|
Closing this out as our windows side team will be upstreaming their changes instead of me bundling everything together. @snnn I'll be adding my own set of changes from Linux side in a separate PR, ensure the DML stuff is split out and make sure anything going in ROCm 7.0 linux side is up to parity. @apwojcik is the point of contact for Windows changes and will be opening up integration off this branch with fixes. |
|
Hi there! We haven't cut the release branch for this version yet, so I'm removing the |
A set of changes required for WCR/WindowsML that were added to the MIGraphX Execution provider. The development was done in the ROCm repository, now we want to sync with the main branch with a single drop. The PR incorporates the review comments from the previous closed PR #25338. Motivation and Context Fixes, changes, and updates to MIGraphX EP that have been done for ROCm development. Pushing this back upstream to ensure mainline onnxruntime is using the latest changes. Moving forward, MIGraphX EP will be cut from the latest official release tag as a base point while also adding additional features that will be contributed back. --------- Co-authored-by: urpetkov-amd <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Scott McKay <[email protected]>
A set of changes required for WCR/WindowsML that were added to the MIGraphX Execution provider. The development was done in the ROCm repository, now we want to sync with the main branch with a single drop. The PR incorporates the review comments from the previous closed PR #25338. Motivation and Context Fixes, changes, and updates to MIGraphX EP that have been done for ROCm development. Pushing this back upstream to ensure mainline onnxruntime is using the latest changes. Moving forward, MIGraphX EP will be cut from the latest official release tag as a base point while also adding additional features that will be contributed back. --------- Co-authored-by: urpetkov-amd <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Scott McKay <[email protected]>
A set of changes required for WCR/WindowsML that were added to the MIGraphX Execution provider. The development was done in the ROCm repository, now we want to sync with the main branch with a single drop. The PR incorporates the review comments from the previous closed PR microsoft#25338. Motivation and Context Fixes, changes, and updates to MIGraphX EP that have been done for ROCm development. Pushing this back upstream to ensure mainline onnxruntime is using the latest changes. Moving forward, MIGraphX EP will be cut from the latest official release tag as a base point while also adding additional features that will be contributed back. --------- Co-authored-by: urpetkov-amd <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Scott McKay <[email protected]>
A set of changes required for WCR/WindowsML that were added to the MIGraphX Execution provider. The development was done in the ROCm repository, now we want to sync with the main branch with a single drop. The PR incorporates the review comments from the previous closed PR microsoft#25338. Motivation and Context Fixes, changes, and updates to MIGraphX EP that have been done for ROCm development. Pushing this back upstream to ensure mainline onnxruntime is using the latest changes. Moving forward, MIGraphX EP will be cut from the latest official release tag as a base point while also adding additional features that will be contributed back. --------- Co-authored-by: urpetkov-amd <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Scott McKay <[email protected]>
Set of changes that were added to generate ROCm 7.0 branch pertaining to the MIGraphX Execution provider. Upstreams a bunch of changes
Adds the following features
Contrib Operator support:
groupnorm
nhwcconv
attention
skip_layernorm
Changes to EP flags
save/load has been removed for model_cache_dir
bf16 quantizer support + appropriate EP flags
hashing of saved .mxr file names based on gpu target, model name, and IO
Additional fixes involving renaming, tidy and other checks
Motivation and Context
Fixes, changes and updates to MIGraphX EP thats been done for ROCm development. Pushing this back upstream to ensure mainline onnxruntime is using the latest changes. Moving forward MIGraphX EP will be cut from the latest official release tag as a base point while also adding additional features that will be contributed back.