-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Enhance XPU support and introduce Intel cppmodule #131276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131276
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f021519 with merge base 9df8ea1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 (xpu folks) |
docs/source/notes/get_start_xpu.rst
Outdated
| # Run from the pytorch directory after cloning | ||
| # For Intel GPU support, please explicitly `export USE_XPU=1` before running command. | ||
| USE_XPU=1 make triton | ||
| export USE_XPU=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like previous variant more. USE_XPU=1 is currently needed for make triton and old version did emphasize that. While this can be used with setup.py develop below, it's not required - XPU environment will be detected automatically if configured properly.
| ROCM_HOME = _find_rocm_home() | ||
| HIP_HOME = _join_rocm_home('hip') if ROCM_HOME else None | ||
| IS_HIP_EXTENSION = True if ((ROCM_HOME is not None) and (torch.version.hip is not None)) else False | ||
| IS_XPU_EXTENSION = True if torch.version.xpu is not None else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"EXTENSION" does not sound right. XPU is official backend in PyTorch, not extension. Was naming inherited from IPEX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be OK.
Here the EXTENSION is used to distinguish the build target from CUDA/HIP/.. EXTENSION, instead of XPU itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gujinghui @dvrogozh Here the EXTENSION is to distinguish the build target, the cpp extension is build for XPU, like CUDA/XIP, it does not meantion to ipex.
| def _accepted_compilers_for_platform() -> List[str]: | ||
| # gnu-c++ and gnu-cc are the conda gcc compilers | ||
| return ['clang++', 'clang'] if IS_MACOS else ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang'] | ||
| if IS_XPU_EXTENSION: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here will impact the logic of the CPU code base, which should use ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang'].
torch/utils/cpp_extension.py
Outdated
|
|
||
| return setuptools.Extension(name, sources, *args, **kwargs) | ||
|
|
||
| class IntelDpcppBuildExtension(build_ext): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you roll this class into BuildExtension?
| oneapi_link_args += [f"-L{x}" for x in library_dirs] | ||
| # oneapi_link_args += ['-fsycl-device-code-split=per_kernel'] | ||
| oneapi_link_args += ["-Wl,--start-group"] | ||
| oneapi_link_args += [f"{x}" for x in _get_one_api_help().get_onemkl_libraries()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, here should not link mkl and dnnl by default. We should only link the additional library libc10_xpu.so and libtorch_xpu.so. For the other library, let the user link by himself via pass them to extra_compile_args and extra_link_args.
@gujinghui any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember, CUDA does link cuDNN and cuBLAS, in their cpp_extention?
EikanWang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow... @uniartisan , we really appreciate your contributions to Intel GPU. I quickly had a glance at your PR. It is great!!! So far, I think the only open is how to support the separate compilation. Because we use host compilers, like gcc/clang, to build host-only source files and use sycl compiler to build the source files that contain sycl kernel. And it is hard to identify the difference by default by the file suffix because there is no dedicated file suffix to SYCL files.
|
|
||
| def get_xpu_version() -> Optional[str]: | ||
| try: | ||
| result = subprocess.run(['icpx', '-fsycl', '--version'], capture_output=True, text=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For XPU build, we use separate compilation.
- For host-only code, we use the host compiler to build it like gcc, clang
- For the code that contains SYCL kernel, we use SYCL compiler to build it
And we use the host linker to link different objects.
In terms of the compiler for cpp extension, it is a open now to use icpx to build all code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uniartisan , we are discussing the requirement with SYCL compiler community. I will keep you posted and update the status here as long as we conclude it. Thanks for your patience and sorry for late update.
torch/utils/cpp_extension.py
Outdated
| "CppExtension", "CUDAExtension", "include_paths", "library_paths", "load", "load_inline", "is_ninja_available", | ||
| "verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc"] | ||
| "verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc", | ||
| "get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"] | |
| "get_xpu_complier", "XPUExtension", "XPUBuildExtension"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guangyey , please check why XPU requires get_xpu_compiler while CUDA does not.
torch/utils/cpp_extension.py
Outdated
|
|
||
|
|
||
| def include_paths(cuda: bool = False) -> List[str]: | ||
| def include_paths(cuda: bool = False, xpu: Optional[bool] = None) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a impl function like def _include_paths_for(device_name: string)
|
Please seek CI approval before scheduling CIFlow labels |
This reverts commit 36bccb1.
dvrogozh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I accidentally attempted to rebuild torchvision against pytorch with this patch. Build of torchvision fails with:
$ python3 setup.py develop
/home/dvrogozh/git/pytorch/vision/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import DistributionNotFound, get_distribution, parse_version
Torchvision build configuration:
FORCE_CUDA = False
FORCE_MPS = False
DEBUG = False
USE_PNG = True
USE_JPEG = True
USE_NVJPEG = True
NVCC_FLAGS = None
USE_FFMPEG = True
USE_VIDEO_CODEC = True
TORCHVISION_INCLUDE = []
TORCHVISION_LIBRARY = []
IS_ROCM = False
BUILD_CUDA_SOURCES = False
Building wheel torchvision-0.20.0a0+09077a8
Building _C extension
Traceback (most recent call last):
File "/home/dvrogozh/git/pytorch/vision/setup.py", line 525, in <module>
make_C_extension(),
File "/home/dvrogozh/git/pytorch/vision/setup.py", line 179, in make_C_extension
return Extension(
File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 972, in CppExtension
library_dirs += library_paths()
File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 1244, in library_paths
paths += _get_one_api_help().get_library_dirs()
File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2625, in _get_one_api_help
oneAPI = _one_api_help()
File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2548, in __init__
self.check_onednn_cfg()
File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2558, in check_onednn_cfg
raise RuntimeError("Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh ")
RuntimeError: Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh
Intel builds are a bit more complicated, and I want to make sure you follow the following
|
|
@uniartisan : I regularly build pytorch xpu backend and follow prerequisites. Pytorch itself builds fine with or without your change. But with this PR applied torchvision fails to build having no issue without this PR applied to pytorch. |
Solution: and source /opt/intel/oneapi/setvars.sh again. |
|
What should be installed is described at https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html: And this does not bring in Another aspect: the code which you change, is it used at pytorch runtime? If it is, then any dependencies from development packages are not expected. |
In my opinion, this will only be used in building Intel XPU extensions. It will not be used in building PyTorch or in normal use(pytorch runtime). |
dvrogozh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few concerns regarding this PR.
It combines too many changes: adding some version printouts, SYCL kernels compilation, oneDNN/oneMKL support, etc. I suggest it should be broken into smaller PRs. The most essential part here should be SYCL kernels compilation support. So I suggest to focus on that and remove everything else for now. There are a number of things to discuss related to SYCL kernels.
From architecture stand point the major thing to consider is whether we can use icpx to build all files in the extension or we will need to follow current xpu pytorch approach and build only sycl files with icpx and use host compiler for everything else. After my own review I believe we actually can't use icpx for everything at the moment.
I also think this PR to support SYCL kernels should come with associated PRs in 3rd party libraries to showcase the usage and/or pytorch side tests. Without them it's unclear whether PR works or not.
Few other APIs are missing, these are load() and load_inline().
Overall, I have alternate implementation where I tried to address concerns I expressed above. Please, help to review and comment on #132945. Let's try to find common ground and provide best implementation possible.
|
|
||
| def XPUExtension(name, sources, *args, **kwargs): | ||
| r""" | ||
| Creates a :class:`setuptools.Extension` for XPU/C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what's the primary purpose of this extension class is. There is no such programming language as XPU. So, documentation of this extension should talk about supporting SYCL kernels at the first place. Further this extension as currently written enables support for few other things as well such as oneDNN. This also should be called out in documentation explicitly.
|
|
||
| return setuptools.Extension(name, sources, *args, **kwargs) | ||
|
|
||
| class XPUBuildExtension(build_ext): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class does not make sense to me. I believe generic BuildExtension class should be extended instead.
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Description:
This pull request introduces two main changes aimed at improving XPU support and enhancing compatibility with Intel hardware:
Improved XPU environment detection and version reporting:
Introduction of Intel cppmodule:
These changes are designed to enhance PyTorch's performance and compatibility on Intel hardware while maintaining support for other platforms. Through these modifications, we aim to provide a better development experience and runtime performance for users utilizing Intel XPU.
Testing:
Next steps:
I welcome any feedback, suggestions, or requests for additional information. Thank you for your time and consideration!
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov