Skip to content

[AOTI] AOTIModelPackageLoader::run dispatch to specific device containner runner not really work.  #140546

@etaf

Description

@etaf

🐛 Describe the bug

The AOTIModelPackageLoader::run is designed to dispatch to run method of corresponding device contain runner: AOTIModelContainerRunnerCuda, AOTIModelContainerRunnerCpu, and the AOTIModelContainerRunnerXpu that I'm implementing. The runner_ is get by device from the above device container runners. And it's declared as
std::unique_ptr<AOTIModelContainerRunner> runner_;.

AOTIModelContainerRunner is the base class of AOTIModelContainerRunnerCuda, AOTIModelContainerRunnerCpu, AOTIModelContainerRunnerXpu.
We expected that when device is cuda, the call runner_->run(), it will call the AOTIModelContainerRunnerCuda::run, but it actually call AOTIModelContainerRunner::run.

This happens for two reasons:

  • The run method in the base class is not declared as virtual.
  • The function signatures differ between the base class and the derived class:
    • base: std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs, AOTInductorStreamHandle cuda_stream_handle = nullptr);
    • derived std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs);

Because of these differences, the two methods are not polymorphic, and runner_->run() always executes the base class implementation.

This leads to issues, especially for GPU that need stream, because the stream parameter will be nullptr in the AOTIModelContainerRunner::run . For CUDA, when the stream is nullptr, the CUDA API automatically uses the current stream instead, so it happens works fine. However, for XPU, its API crashes when the stream is nullptr, resulting in a null pointer dereference.

@desertfire Sorry, I'm not sure who to assign this issue and just assigned to you, please feel free to re-assign this issue to the corresponding developer. Thanks.

Versions

Collecting environment information...
PyTorch version: 2.6.0a0+git8a80cee
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.0
Libc version: glibc-2.35

Python version: 3.9.20 | packaged by conda-forge | (main, Sep 30 2024, 17:49:10) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB
Nvidia driver version: 550.120
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

cc @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: aotinductoraot inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions