-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Summary
The base repro:
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python benchmark.py --model maxvit_nano_rw_256 --precision bfloat16 --torchcompile --bench train --no-retry -b 64
This is producing :
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /home/drisspg/meta/pytorch/c10/cuda/CUDAException.cpp:43 (most
frame #6: at::native::_efficient_attention_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&, double, at::Tensor const&, at::Tensor const&, long, bool, std::optional<double>, std::optional<long>, std::optional<long>, bool) + 0x2338 (0x7f0a1e35b418 in /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x344cbc7 (0x7f0a1e44cbc7 in /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x347bb1d (0x7f0a1e47bb1d in /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so)
frame #9: at::_ops::_efficient_attention_backward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, at::Tensor const&, double, at::Tensor const&, at::Tensor const&, long, bool, std::optional<double>, std::optional<long>, std::optional<long>, bool) + 0x350 (0x7f0a2b292490 in /home/drisspg/meta/pytorch/torch/lib/libtorch_cpu.so)
frame #10: at::native::_scaled_dot_product_efficient_attention_backward_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, std::array<bool, 4ul>, bool, std::optional<double>) + 0x2b3 (0x7f0a1e1bef23 in Current Debugging
Verified that this is specific to inductor:
- No error in Eager
- No error when compiling with:
aot_eager_decomp_partitioner
The actual IMA is an out of bounds read.
Smaller ish repro
https://gist.github.com/drisspg/49d5bec4fdadeace206455267e1ef135
IF you run this with PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 you should hit the same IMA
Metadata
Metadata
Assignees
Labels
module: inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module