Skip to content

Conversation

@zhiyuan1i
Copy link

@zhiyuan1i zhiyuan1i commented Jul 20, 2024

Description:
This pull request introduces two main changes aimed at improving XPU support and enhancing compatibility with Intel hardware:

  1. Improved XPU environment detection and version reporting:

    • When XPU environment variables are detected, the local oneAPI version is now incorporated into the PyTorch version information.
    • This enhancement facilitates better tracking and diagnosis of XPU-related issues, improving the accuracy and relevance of version information.
  2. Introduction of Intel cppmodule:

    • Due to compiler differences between Intel, AMD, and CUDA, a dedicated class and function have been added to handle Intel hardware-specific functionality.
    • This implementation is based on Intel's version in the IPEX (Intel PyTorch Extensions) library.
    • The new class and function provide improved compatibility and optimization, particularly when using Intel hardware.

These changes are designed to enhance PyTorch's performance and compatibility on Intel hardware while maintaining support for other platforms. Through these modifications, we aim to provide a better development experience and runtime performance for users utilizing Intel XPU.

Testing:

  • While specific test scripts have not been written yet, the build has been tested in other Intel deepspeed environments to ensure compatibility and functionality.
  • Full integration tests and dedicated test scripts will be developed and added in a follow-up PR.

Next steps:

  • Develop comprehensive test suite for the new XPU features and Intel cppmodule.
  • Gather performance metrics to quantify improvements on Intel hardware.

I welcome any feedback, suggestions, or requests for additional information. Thank you for your time and consideration!

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131276

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f021519 with merge base 9df8ea1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 20, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@ezyang ezyang requested a review from albanD July 20, 2024 20:51
@dvrogozh
Copy link
Contributor

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 (xpu folks)

# Run from the pytorch directory after cloning
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
USE_XPU=1 make triton
export USE_XPU=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like previous variant more. USE_XPU=1 is currently needed for make triton and old version did emphasize that. While this can be used with setup.py develop below, it's not required - XPU environment will be detected automatically if configured properly.

ROCM_HOME = _find_rocm_home()
HIP_HOME = _join_rocm_home('hip') if ROCM_HOME else None
IS_HIP_EXTENSION = True if ((ROCM_HOME is not None) and (torch.version.hip is not None)) else False
IS_XPU_EXTENSION = True if torch.version.xpu is not None else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"EXTENSION" does not sound right. XPU is official backend in PyTorch, not extension. Was naming inherited from IPEX?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be OK.
Here the EXTENSION is used to distinguish the build target from CUDA/HIP/.. EXTENSION, instead of XPU itself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gujinghui @dvrogozh Here the EXTENSION is to distinguish the build target, the cpp extension is build for XPU, like CUDA/XIP, it does not meantion to ipex.

@gujinghui gujinghui requested a review from guangyey July 23, 2024 14:39
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 23, 2024
def _accepted_compilers_for_platform() -> List[str]:
# gnu-c++ and gnu-cc are the conda gcc compilers
return ['clang++', 'clang'] if IS_MACOS else ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang']
if IS_XPU_EXTENSION:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here will impact the logic of the CPU code base, which should use ['g++', 'gcc', 'gnu-c++', 'gnu-cc', 'clang++', 'clang'].


return setuptools.Extension(name, sources, *args, **kwargs)

class IntelDpcppBuildExtension(build_ext):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you roll this class into BuildExtension?

oneapi_link_args += [f"-L{x}" for x in library_dirs]
# oneapi_link_args += ['-fsycl-device-code-split=per_kernel']
oneapi_link_args += ["-Wl,--start-group"]
oneapi_link_args += [f"{x}" for x in _get_one_api_help().get_onemkl_libraries()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, here should not link mkl and dnnl by default. We should only link the additional library libc10_xpu.so and libtorch_xpu.so. For the other library, let the user link by himself via pass them to extra_compile_args and extra_link_args.
@gujinghui any idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember, CUDA does link cuDNN and cuBLAS, in their cpp_extention?

Copy link
Collaborator

@EikanWang EikanWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow... @uniartisan , we really appreciate your contributions to Intel GPU. I quickly had a glance at your PR. It is great!!! So far, I think the only open is how to support the separate compilation. Because we use host compilers, like gcc/clang, to build host-only source files and use sycl compiler to build the source files that contain sycl kernel. And it is hard to identify the difference by default by the file suffix because there is no dedicated file suffix to SYCL files.


def get_xpu_version() -> Optional[str]:
try:
result = subprocess.run(['icpx', '-fsycl', '--version'], capture_output=True, text=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For XPU build, we use separate compilation.

  • For host-only code, we use the host compiler to build it like gcc, clang
  • For the code that contains SYCL kernel, we use SYCL compiler to build it

And we use the host linker to link different objects.

In terms of the compiler for cpp extension, it is a open now to use icpx to build all code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uniartisan , we are discussing the requirement with SYCL compiler community. I will keep you posted and update the status here as long as we conclude it. Thanks for your patience and sorry for late update.

"CppExtension", "CUDAExtension", "include_paths", "library_paths", "load", "load_inline", "is_ninja_available",
"verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc"]
"verify_ninja_availability", "remove_extension_h_precompiler_headers", "get_cxx_compiler", "check_compiler_is_gcc",
"get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"get_dpcpp_complier", "DPCPPExtension", "IntelDpcppBuildExtension"]
"get_xpu_complier", "XPUExtension", "XPUBuildExtension"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guangyey , please check why XPU requires get_xpu_compiler while CUDA does not.



def include_paths(cuda: bool = False) -> List[str]:
def include_paths(cuda: bool = False, xpu: Optional[bool] = None) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a impl function like def _include_paths_for(device_name: string)

@EikanWang EikanWang added the ciflow/xpu Run XPU CI tasks label Jul 24, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 24, 2024

Please seek CI approval before scheduling CIFlow labels

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 24, 2024
This reverts commit 36bccb1.
@ezyang ezyang removed their request for review July 25, 2024 02:37
Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I accidentally attempted to rebuild torchvision against pytorch with this patch. Build of torchvision fails with:

$ python3 setup.py develop
/home/dvrogozh/git/pytorch/vision/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import DistributionNotFound, get_distribution, parse_version
Torchvision build configuration:
FORCE_CUDA = False
FORCE_MPS = False
DEBUG = False
USE_PNG = True
USE_JPEG = True
USE_NVJPEG = True
NVCC_FLAGS = None
USE_FFMPEG = True
USE_VIDEO_CODEC = True
TORCHVISION_INCLUDE = []
TORCHVISION_LIBRARY = []
IS_ROCM = False
BUILD_CUDA_SOURCES = False
Building wheel torchvision-0.20.0a0+09077a8
Building _C extension
Traceback (most recent call last):
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 525, in <module>
    make_C_extension(),
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 179, in make_C_extension
    return Extension(
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 972, in CppExtension
    library_dirs += library_paths()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 1244, in library_paths
    paths += _get_one_api_help().get_library_dirs()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2625, in _get_one_api_help
    oneAPI = _one_api_help()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2548, in __init__
    self.check_onednn_cfg()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2558, in check_onednn_cfg
    raise RuntimeError("Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh ")
RuntimeError: Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh

@zhiyuan1i
Copy link
Author

zhiyuan1i commented Aug 2, 2024

FYI, I accidentally attempted to rebuild torchvision against pytorch with this patch. Build of torchvision fails with:

$ python3 setup.py develop
/home/dvrogozh/git/pytorch/vision/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import DistributionNotFound, get_distribution, parse_version
Torchvision build configuration:
FORCE_CUDA = False
FORCE_MPS = False
DEBUG = False
USE_PNG = True
USE_JPEG = True
USE_NVJPEG = True
NVCC_FLAGS = None
USE_FFMPEG = True
USE_VIDEO_CODEC = True
TORCHVISION_INCLUDE = []
TORCHVISION_LIBRARY = []
IS_ROCM = False
BUILD_CUDA_SOURCES = False
Building wheel torchvision-0.20.0a0+09077a8
Building _C extension
Traceback (most recent call last):
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 525, in <module>
    make_C_extension(),
  File "/home/dvrogozh/git/pytorch/vision/setup.py", line 179, in make_C_extension
    return Extension(
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 972, in CppExtension
    library_dirs += library_paths()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 1244, in library_paths
    paths += _get_one_api_help().get_library_dirs()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2625, in _get_one_api_help
    oneAPI = _one_api_help()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2548, in __init__
    self.check_onednn_cfg()
  File "/home/dvrogozh/git/pytorch/pytorch/torch/utils/cpp_extension.py", line 2558, in check_onednn_cfg
    raise RuntimeError("Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh ")
RuntimeError: Didn't detect dnnl root. Please source <oneapi_dir>/dnnl/<version>/env/vars.sh

Intel builds are a bit more complicated, and I want to make sure you follow the following

  1. https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
  2. source ${ONEAPI_ROOT}/setvars.sh normally /opt/intel/oneapi/latest/setvars.sh
  3. the whole requirements can be find at:
    https://pytorch.org/docs/stable/notes/get_start_xpu
    Hope this helps 😊 , I'll try to rebuild torchvision later,I haven't tried it before, maybe there's a bit of a problem here. But it looks like it might be the intel-oneapi missing the path. You can try installing intel-oneapi-base first if I remember correctly

@dvrogozh
Copy link
Contributor

dvrogozh commented Aug 2, 2024

@uniartisan : I regularly build pytorch xpu backend and follow prerequisites. Pytorch itself builds fine with or without your change. But with this PR applied torchvision fails to build having no issue without this PR applied to pytorch.

@zhiyuan1i
Copy link
Author

@uniartisan : I regularly build pytorch xpu backend and follow prerequisites. Pytorch itself builds fine with or without your change. But with this PR applied torchvision fails to build having no issue without this PR applied to pytorch.

Solution:

sudo apt install intel-oneapi-dnnl-devel # for debian based

and source /opt/intel/oneapi/setvars.sh again.
Try to rebuild vision, now it's fine.

Using /home/xxx/miniconda3/envs/intel/lib/python3.11/site-packages
Finished processing dependencies for torchvision==0.20.0a0+61bd547

@dvrogozh
Copy link
Contributor

dvrogozh commented Aug 2, 2024

What should be installed is described at https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html:

sudo apt install intel-for-pytorch-gpu-dev intel-pti-dev

And this does not bring in intel-oneapi-dnnl-devel. If it's actually needed, then it's missing dependency for one of above packages.

Another aspect: the code which you change, is it used at pytorch runtime? If it is, then any dependencies from development packages are not expected.

@zhiyuan1i
Copy link
Author

zhiyuan1i commented Aug 3, 2024

What should be installed is described at https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html:

sudo apt install intel-for-pytorch-gpu-dev intel-pti-dev

And this does not bring in intel-oneapi-dnnl-devel. If it's actually needed, then it's missing dependency for one of above packages.

Another aspect: the code which you change, is it used at pytorch runtime? If it is, then any dependencies from development packages are not expected.

In my opinion, this will only be used in building Intel XPU extensions. It will not be used in building PyTorch or in normal use(pytorch runtime).
The document only describes the previous packages needed in PyTorch 2.5, therefore it missed this one.
This is similar to how building CUDA extensions requires nvcc, but it's not needed during normal runtime use.

Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few concerns regarding this PR.

It combines too many changes: adding some version printouts, SYCL kernels compilation, oneDNN/oneMKL support, etc. I suggest it should be broken into smaller PRs. The most essential part here should be SYCL kernels compilation support. So I suggest to focus on that and remove everything else for now. There are a number of things to discuss related to SYCL kernels.

From architecture stand point the major thing to consider is whether we can use icpx to build all files in the extension or we will need to follow current xpu pytorch approach and build only sycl files with icpx and use host compiler for everything else. After my own review I believe we actually can't use icpx for everything at the moment.

I also think this PR to support SYCL kernels should come with associated PRs in 3rd party libraries to showcase the usage and/or pytorch side tests. Without them it's unclear whether PR works or not.

Few other APIs are missing, these are load() and load_inline().

Overall, I have alternate implementation where I tried to address concerns I expressed above. Please, help to review and comment on #132945. Let's try to find common ground and provide best implementation possible.


def XPUExtension(name, sources, *args, **kwargs):
r"""
Creates a :class:`setuptools.Extension` for XPU/C++.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what's the primary purpose of this extension class is. There is no such programming language as XPU. So, documentation of this extension should talk about supporting SYCL kernels at the first place. Further this extension as currently written enables support for few other things as well such as oneDNN. This also should be called out in documentation explicitly.


return setuptools.Extension(name, sources, *args, **kwargs)

class XPUBuildExtension(build_ext):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class does not make sense to me. I believe generic BuildExtension class should be extended instead.

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Oct 18, 2024
@github-actions github-actions bot closed this Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: inductor open source Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants