-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
The AOTIModelPackageLoader::run is designed to dispatch to run method of corresponding device contain runner: AOTIModelContainerRunnerCuda, AOTIModelContainerRunnerCpu, and the AOTIModelContainerRunnerXpu that I'm implementing. The runner_ is get by device from the above device container runners. And it's declared as
std::unique_ptr<AOTIModelContainerRunner> runner_;.
AOTIModelContainerRunner is the base class of AOTIModelContainerRunnerCuda, AOTIModelContainerRunnerCpu, AOTIModelContainerRunnerXpu.
We expected that when device is cuda, the call runner_->run(), it will call the AOTIModelContainerRunnerCuda::run, but it actually call AOTIModelContainerRunner::run.
This happens for two reasons:
- The run method in the base class is not declared as virtual.
- The function signatures differ between the base class and the derived class:
- base:
std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs, AOTInductorStreamHandle cuda_stream_handle = nullptr); - derived
std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs);
- base:
Because of these differences, the two methods are not polymorphic, and runner_->run() always executes the base class implementation.
This leads to issues, especially for GPU that need stream, because the stream parameter will be nullptr in the AOTIModelContainerRunner::run . For CUDA, when the stream is nullptr, the CUDA API automatically uses the current stream instead, so it happens works fine. However, for XPU, its API crashes when the stream is nullptr, resulting in a null pointer dereference.
@desertfire Sorry, I'm not sure who to assign this issue and just assigned to you, please feel free to re-assign this issue to the corresponding developer. Thanks.
Versions
Collecting environment information...
PyTorch version: 2.6.0a0+git8a80cee
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.0
Libc version: glibc-2.35
Python version: 3.9.20 | packaged by conda-forge | (main, Sep 30 2024, 17:49:10) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB
Nvidia driver version: 550.120
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
cc @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4