Enhance XPU support and introduce Intel cppmodule #131276

zhiyuan1i · 2024-07-20T05:10:32Z

Description:
This pull request introduces two main changes aimed at improving XPU support and enhancing compatibility with Intel hardware:

Improved XPU environment detection and version reporting:
- When XPU environment variables are detected, the local oneAPI version is now incorporated into the PyTorch version information.
- This enhancement facilitates better tracking and diagnosis of XPU-related issues, improving the accuracy and relevance of version information.
Introduction of Intel cppmodule:
- Due to compiler differences between Intel, AMD, and CUDA, a dedicated class and function have been added to handle Intel hardware-specific functionality.
- This implementation is based on Intel's version in the IPEX (Intel PyTorch Extensions) library.
- The new class and function provide improved compatibility and optimization, particularly when using Intel hardware.

These changes are designed to enhance PyTorch's performance and compatibility on Intel hardware while maintaining support for other platforms. Through these modifications, we aim to provide a better development experience and runtime performance for users utilizing Intel XPU.

Testing:

While specific test scripts have not been written yet, the build has been tested in other Intel deepspeed environments to ensure compatibility and functionality.
Full integration tests and dedicated test scripts will be developed and added in a follow-up PR.

Next steps:

Develop comprehensive test suite for the new XPU features and Intel cppmodule.
Gather performance metrics to quantify improvements on Intel hardware.

I welcome any feedback, suggestions, or requests for additional information. Thank you for your time and consideration!

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-07-20T05:10:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131276

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f021519 with merge base 9df8ea1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-07-20T05:10:39Z

The committers listed above are authorized under a signed CLA.

✅ login: uniartisan / name: Zhiyuan Li (9f6d7dd, 36bccb1, f021519, 0e8a052, e8db8cc, 228521c, 257858b, dd2772a, 22453ee, ac2e603)

dvrogozh · 2024-07-23T14:26:07Z

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 (xpu folks)

dvrogozh · 2024-07-23T14:28:47Z

docs/source/notes/get_start_xpu.rst

   # Run from the pytorch directory after cloning
   # For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
-   USE_XPU=1 make triton
+   export USE_XPU=1


I like previous variant more. USE_XPU=1 is currently needed for make triton and old version did emphasize that. While this can be used with setup.py develop below, it's not required - XPU environment will be detected automatically if configured properly.

dvrogozh · 2024-07-23T14:30:23Z

torch/utils/cpp_extension.py

 ROCM_HOME = _find_rocm_home()
 HIP_HOME = _join_rocm_home('hip') if ROCM_HOME else None
 IS_HIP_EXTENSION = True if ((ROCM_HOME is not None) and (torch.version.hip is not None)) else False
+IS_XPU_EXTENSION = True if torch.version.xpu is not None else False


"EXTENSION" does not sound right. XPU is official backend in PyTorch, not extension. Was naming inherited from IPEX?

Should be OK.
Here the EXTENSION is used to distinguish the build target from CUDA/HIP/.. EXTENSION, instead of XPU itself?

@gujinghui @dvrogozh Here the EXTENSION is to distinguish the build target, the cpp extension is build for XPU, like CUDA/XIP, it does not meantion to ipex.

torch/CMakeLists.txt

torch/utils/cpp_extension.py

guangyey · 2024-07-24T03:13:31Z

torch/utils/cpp_extension.py

 def _accepted_compilers_for_platform() -> List[str]:
    # gnu-c++ and gnu-cc are the conda gcc compilers
-    return ['clang++', 'clang'] if IS_MACOS else ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang']
+    if IS_XPU_EXTENSION:


Here will impact the logic of the CPU code base, which should use ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang'].

torch/utils/cpp_extension.py

guangyey · 2024-07-24T03:20:24Z

torch/utils/cpp_extension.py

+
+    return setuptools.Extension(name, sources, *args, **kwargs)
+
+class IntelDpcppBuildExtension(build_ext):


Could you roll this class into BuildExtension?

torch/utils/cpp_extension.py

guangyey · 2024-07-24T03:31:57Z

torch/utils/cpp_extension.py

+        oneapi_link_args += [f"-L{x}" for x in library_dirs]
+        # oneapi_link_args += ['-fsycl-device-code-split=per_kernel']
+        oneapi_link_args += ["-Wl,--start-group"]
+        oneapi_link_args += [f"{x}" for x in _get_one_api_help().get_onemkl_libraries()]


In my opinion, here should not link mkl and dnnl by default. We should only link the additional library libc10_xpu.so and libtorch_xpu.so. For the other library, let the user link by himself via pass them to extra_compile_args and extra_link_args.
@gujinghui any idea?

I remember, CUDA does link cuDNN and cuBLAS, in their cpp_extention?

torch/utils/cpp_extension.py

EikanWang

Wow... @uniartisan , we really appreciate your contributions to Intel GPU. I quickly had a glance at your PR. It is great!!! So far, I think the only open is how to support the separate compilation. Because we use host compilers, like gcc/clang, to build host-only source files and use sycl compiler to build the source files that contain sycl kernel. And it is hard to identify the difference by default by the file suffix because there is no dedicated file suffix to SYCL files.

docs/source/notes/get_start_xpu.rst

EikanWang · 2024-07-24T06:30:15Z

tools/generate_torch_version.py


+def get_xpu_version() -> Optional[str]:
+    try:
+        result = subprocess.run(['icpx', '-fsycl', '--version'], capture_output=True, text=True)


For XPU build, we use separate compilation.

For host-only code, we use the host compiler to build it like gcc, clang

For the code that contains SYCL kernel, we use SYCL compiler to build it

And we use the host linker to link different objects.

In terms of the compiler for cpp extension, it is a open now to use icpx to build all code.

@uniartisan , we are discussing the requirement with SYCL compiler community. I will keep you posted and update the status here as long as we conclude it. Thanks for your patience and sorry for late update.

EikanWang · 2024-07-24T06:32:32Z

torch/utils/cpp_extension.py

           "CppExtension", "CUDAExtension", "include_paths", "library_paths", "load", "load_inline", "is_ninja_available",
-           "verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc"]
+           "verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc",
+           "get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"]


Suggested change

"get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"]

"get_xpu_complier", "XPUExtension", "XPUBuildExtension"]

@guangyey , please check why XPU requires get_xpu_compiler while CUDA does not.

EikanWang · 2024-07-24T06:35:19Z

torch/utils/cpp_extension.py



-def include_paths(cuda: bool = False) -> List[str]:
+def include_paths(cuda: bool = False, xpu: Optional[bool] = None) -> List[str]:


We should add a impl function like def _include_paths_for(device_name: string)

torch/utils/cpp_extension.py

pytorch-bot · 2024-07-24T06:42:42Z

Please seek CI approval before scheduling CIFlow labels

This reverts commit 36bccb1.

dvrogozh

FYI, I accidentally attempted to rebuild torchvision against pytorch with this patch. Build of torchvision fails with:

$ python3 setup.py develop
/home/dvrogozh/git/pytorch/vision/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import DistributionNotFound, get_distribution, parse_version
Torchvision build configuration:
FORCE_CUDA = False
FORCE_MPS = False
DEBUG = False
USE_PNG = True
USE_JPEG = True
USE_NVJPEG = True
NVCC_FLAGS = None
USE_FFMPEG = True
USE_VIDEO_CODEC = True
TORCHVISION_INCLUDE = []
TORCHVISION_LIBRARY = []
IS_ROCM = False
BUILD_CUDA_SOURCES = False
Building wheel torchvision-0.20.0a0+09077a8
Building _C extension
Traceback (most recent call last):
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 525, in <module>
    make_C_extension(),
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 179, in make_C_extension
    return Extension(
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 972, in CppExtension
    library_dirs += library_paths()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 1244, in library_paths
    paths += _get_one_api_help().get_library_dirs()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2625, in _get_one_api_help
    oneAPI = _one_api_help()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2548, in __init__
    self.check_onednn_cfg()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2558, in check_onednn_cfg
    raise RuntimeError("Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh ")
RuntimeError: Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh

zhiyuan1i · 2024-08-02T01:03:29Z

FYI, I accidentally attempted to rebuild torchvision against pytorch with this patch. Build of torchvision fails with:

$ python3 setup.py develop
/home/dvrogozh/git/pytorch/vision/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import DistributionNotFound, get_distribution, parse_version
Torchvision build configuration:
FORCE_CUDA = False
FORCE_MPS = False
DEBUG = False
USE_PNG = True
USE_JPEG = True
USE_NVJPEG = True
NVCC_FLAGS = None
USE_FFMPEG = True
USE_VIDEO_CODEC = True
TORCHVISION_INCLUDE = []
TORCHVISION_LIBRARY = []
IS_ROCM = False
BUILD_CUDA_SOURCES = False
Building wheel torchvision-0.20.0a0+09077a8
Building _C extension
Traceback (most recent call last):
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 525, in <module>
    make_C_extension(),
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 179, in make_C_extension
    return Extension(
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 972, in CppExtension
    library_dirs += library_paths()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 1244, in library_paths
    paths += _get_one_api_help().get_library_dirs()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2625, in _get_one_api_help
    oneAPI = _one_api_help()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2548, in __init__
    self.check_onednn_cfg()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2558, in check_onednn_cfg
    raise RuntimeError("Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh ")
RuntimeError: Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh

Intel builds are a bit more complicated, and I want to make sure you follow the following

https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
source ${ONEAPI_ROOT}/setvars.sh normally /opt/intel/oneapi/latest/setvars.sh
the whole requirements can be find at:
https://pytorch.org/docs/stable/notes/get_start_xpu
Hope this helps 😊 , I'll try to rebuild torchvision later，I haven't tried it before, maybe there's a bit of a problem here. But it looks like it might be the intel-oneapi missing the path. You can try installing intel-oneapi-base first if I remember correctly

dvrogozh · 2024-08-02T03:23:18Z

@uniartisan : I regularly build pytorch xpu backend and follow prerequisites. Pytorch itself builds fine with or without your change. But with this PR applied torchvision fails to build having no issue without this PR applied to pytorch.

zhiyuan1i · 2024-08-02T06:00:47Z

@uniartisan : I regularly build pytorch xpu backend and follow prerequisites. Pytorch itself builds fine with or without your change. But with this PR applied torchvision fails to build having no issue without this PR applied to pytorch.

Solution:

sudo apt install intel-oneapi-dnnl-devel # for debian based

and source /opt/intel/oneapi/setvars.sh again.
Try to rebuild vision, now it's fine.

Using /home/xxx/miniconda3/envs/intel/lib/python3.11/site-packages
Finished processing dependencies for torchvision==0.20.0a0+61bd547

dvrogozh · 2024-08-02T14:24:00Z

What should be installed is described at https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html:

sudo apt install intel-for-pytorch-gpu-dev intel-pti-dev

And this does not bring in intel-oneapi-dnnl-devel. If it's actually needed, then it's missing dependency for one of above packages.

Another aspect: the code which you change, is it used at pytorch runtime? If it is, then any dependencies from development packages are not expected.

zhiyuan1i · 2024-08-03T06:13:53Z

What should be installed is described at https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html:
sudo apt install intel-for-pytorch-gpu-dev intel-pti-dev
And this does not bring in intel-oneapi-dnnl-devel. If it's actually needed, then it's missing dependency for one of above packages.

Another aspect: the code which you change, is it used at pytorch runtime? If it is, then any dependencies from development packages are not expected.

In my opinion, this will only be used in building Intel XPU extensions. It will not be used in building PyTorch or in normal use（pytorch runtime）.
The document only describes the previous packages needed in PyTorch 2.5, therefore it missed this one.
This is similar to how building CUDA extensions requires nvcc, but it's not needed during normal runtime use.

dvrogozh

I have few concerns regarding this PR.

It combines too many changes: adding some version printouts, SYCL kernels compilation, oneDNN/oneMKL support, etc. I suggest it should be broken into smaller PRs. The most essential part here should be SYCL kernels compilation support. So I suggest to focus on that and remove everything else for now. There are a number of things to discuss related to SYCL kernels.

From architecture stand point the major thing to consider is whether we can use icpx to build all files in the extension or we will need to follow current xpu pytorch approach and build only sycl files with icpx and use host compiler for everything else. After my own review I believe we actually can't use icpx for everything at the moment.

I also think this PR to support SYCL kernels should come with associated PRs in 3rd party libraries to showcase the usage and/or pytorch side tests. Without them it's unclear whether PR works or not.

Few other APIs are missing, these are load() and load_inline().

Overall, I have alternate implementation where I tried to address concerns I expressed above. Please, help to review and comment on #132945. Let's try to find common ground and provide best implementation possible.

dvrogozh · 2024-08-16T03:34:47Z

torch/utils/cpp_extension.py

+
+def XPUExtension(name, sources, *args, **kwargs):
+    r"""
+    Creates a :class:`setuptools.Extension` for XPU/C++.


It's unclear what's the primary purpose of this extension class is. There is no such programming language as XPU. So, documentation of this extension should talk about supporting SYCL kernels at the first place. Further this extension as currently written enables support for few other things as well such as oneDNN. This also should be called out in documentation explicitly.

dvrogozh · 2024-08-16T03:35:23Z

torch/utils/cpp_extension.py

+
+    return setuptools.Extension(name, sources, *args, **kwargs)
+
+class XPUBuildExtension(build_ext):


This class does not make sense to me. I believe generic BuildExtension class should be extended instead.

github-actions · 2024-10-18T13:36:31Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

upstream IntelDpcppBuildExtension

9f6d7dd

zhiyuan1i requested review from ezyang, fmassa and soumith as code owners July 20, 2024 05:10

pytorchbot added the open source label Jul 20, 2024

ezyang requested a review from albanD July 20, 2024 20:51

Implement better error handling to improve code robustness

0e8a052

zhiyuan1i force-pushed the my-dev branch from 58a765c to 0e8a052 Compare July 21, 2024 10:09

update xpu docs

36bccb1

dvrogozh reviewed Jul 23, 2024

View reviewed changes

gujinghui requested a review from guangyey July 23, 2024 14:39

gujinghui reviewed Jul 23, 2024

View reviewed changes

torch/CMakeLists.txt Show resolved Hide resolved

gujinghui reviewed Jul 23, 2024

View reviewed changes