forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 75
Fix types and warp sizes for ROCm #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
iotamudelta
merged 3 commits into
ROCm:types_warp_sizes_20181009
from
iotamudelta:my_types_warp_sizes_20181009
Oct 9, 2018
Merged
Fix types and warp sizes for ROCm #256
iotamudelta
merged 3 commits into
ROCm:types_warp_sizes_20181009
from
iotamudelta:my_types_warp_sizes_20181009
Oct 9, 2018
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lcskrishna
pushed a commit
to lcskrishna/pytorch
that referenced
this pull request
May 15, 2023
When tensor is resized, reference array to it's sizes may become invalid. Make a copy in advance.
<details>
<summary>ASAN report</summary>
```
=================================================================
==1115867==ERROR: AddressSanitizer: heap-use-after-free on address 0x61000013d790 at pc 0x03ff8e7da360 bp 0x03fff53c83a0 sp 0x03fff53c8390
READ of size 8 at 0x61000013d790 thread T0
#0 0x3ff8e7da35f in c10::SymInt::is_heap_allocated() const /home/user/pytorch/c10/core/SymInt.h:154
ROCm#1 0x3ff8e7da35f in c10::SymInt::maybe_as_int() const /home/user/pytorch/c10/core/SymInt.h:215
ROCm#2 0x3ff8e7d0a6d in c10::SymInt::sym_eq(c10::SymInt const&) const /home/user/pytorch/c10/core/SymInt.cpp:69
ROCm#3 0x3ff7a9ab0bd in c10::SymInt::operator==(c10::SymInt const&) const /home/user/pytorch/c10/core/SymInt.h:177
ROCm#4 0x3ff7a9aaedd in bool std::__equal<false>::equal<c10::SymInt const*, c10::SymInt const*>(c10::SymInt const*, c10::SymInt const*, c10::SymInt const*) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-
v11/bits/stl_algobase.h:1162
ROCm#5 0x3ff7a9aae4b in bool std::__equal_aux1<c10::SymInt const*, c10::SymInt const*>(c10::SymInt const*, c10::SymInt const*, c10::SymInt const*) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/
stl_algobase.h:1211
ROCm#6 0x3ff7a9aae05 in bool std::__equal_aux<c10::SymInt const*, c10::SymInt const*>(c10::SymInt const*, c10::SymInt const*, c10::SymInt const*) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/s
tl_algobase.h:1219
ROCm#7 0x3ff7a9aad97 in bool std::equal<c10::SymInt const*, c10::SymInt const*>(c10::SymInt const*, c10::SymInt const*, c10::SymInt const*) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_alg
obase.h:1556
ROCm#8 0x3ff4b23c771 in c10::ArrayRef<c10::SymInt>::equals(c10::ArrayRef<c10::SymInt>) const /home/user/pytorch/c10/util/ArrayRef.h:188
ROCm#9 0x3ff4cb91bc1 in bool c10::operator!=<c10::SymInt>(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>) /home/user/pytorch/c10/util/ArrayRef.h:341
ROCm#10 0x3ff6d1b57ff in torch::ADInplaceOrView::resize_(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pytorch/torch/csrc/autograd/Variab
leTypeManual.cpp:408
ROCm#11 0x3ff6d1e59c7 in c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c1
0::MemoryFormat>), &torch::ADInplaceOrView::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>
> >::operator()(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pytorch/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13
ROCm#12 0x3ff6d1e59c7 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10:
:ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>), &torch::ADInplaceOrView::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::Sy
mInt>, c10::optional<c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::Disp
atchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:480
ROCm#13 0x3ff51ca5129 in at::Tensor const& c10::callUnboxedKernelFunction<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(void*, c10::OperatorKernel*,
c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>&&, c10::optional<c10::MemoryFormat>&&) /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
ROCm#14 0x3ff51ca6e8f in at::Tensor const& c10::KernelFunction::call<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, c10::D
ispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:90
ROCm#15 0x3ff51ca6e8f in at::Tensor const& c10::Dispatcher::redispatch<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::TypedOperatorHandle<at::Ten
sor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)
const /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:656
ROCm#16 0x3ff5182006b in c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::redispatch(c10::DispatchKeySet, at::Tensor const&, c
10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:492
ROCm#17 0x3ff5182006b in at::_ops::resize_::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) aten/src/ATen/Operators_4.cpp:2144
ROCm#18 0x3ff6d1d5e07 in at::redispatch::resize__symint(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) aten/src/ATen/RedispatchFunctions.h:2847
ROCm#19 0x3ff6d1bbb67 in torch::autograd::VariableType::(anonymous namespace)::resize_(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pyto
rch/torch/csrc/autograd/VariableTypeManual.cpp:243
ROCm#20 0x3ff6d1bd197 in c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c1
0::MemoryFormat>), &torch::autograd::VariableType::(anonymous namespace)::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10
::optional<c10::MemoryFormat> > >::operator()(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pytorch/aten/src/ATen/core/boxing/impl/WrapFu
nctionIntoFunctor.h:13
ROCm#21 0x3ff6d1bd197 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10:
:ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>), &torch::autograd::VariableType::(anonymous namespace)::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor
const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::call(c
10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) /home/user/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor
.h:480
ROCm#22 0x3ff51ca5129 in at::Tensor const& c10::callUnboxedKernelFunction<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(void*, c10::OperatorKernel*,
c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>&&, c10::optional<c10::MemoryFormat>&&) /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
ROCm#23 0x3ff5181ead1 in at::Tensor const& c10::KernelFunction::call<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, c10::D
ispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:90
ROCm#24 0x3ff5181ead1 in at::Tensor const& c10::Dispatcher::call<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::TypedOperatorHandle<at::Tensor co
nst& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)> const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const /home/user/pytorch/at
en/src/ATen/core/dispatch/Dispatcher.h:639
ROCm#25 0x3ff5181ead1 in c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>,
c10::optional<c10::MemoryFormat>) const /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:487
ROCm#26 0x3ff5181ead1 in at::_ops::resize_::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) aten/src/ATen/Operators_4.cpp:2137
ROCm#27 0x3ff79b44fcf in at::Tensor::resize__symint(c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const aten/src/ATen/core/TensorBody.h:2452
ROCm#28 0x3ff79a802db in torch::autograd::THPVariable_resize_(_object*, _object*, _object*)::$_0::operator()(at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const /home/us
er/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:13417
ROCm#29 0x3ff7999f1eb in torch::autograd::THPVariable_resize_(_object*, _object*, _object*) /home/user/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:13419
ROCm#30 0x3ffa2c9b009 in method_vectorcall_VARARGS_KEYWORDS Objects/descrobject.c:344
ROCm#31 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#32 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#33 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#34 0x3ffa2dff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
ROCm#35 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#36 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#37 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#38 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#39 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#40 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#41 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#42 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#43 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#44 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#45 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#46 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#47 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#48 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#49 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#50 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#51 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#52 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#53 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#54 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#55 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#56 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#57 0x3ffa2dff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
ROCm#58 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#59 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#60 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#61 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#62 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#63 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#64 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#65 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#66 0x3ffa2dff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
ROCm#67 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#68 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#69 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#70 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#71 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#72 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#73 0x3ffa2dff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
ROCm#74 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#75 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#76 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#77 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#78 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#79 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#80 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#81 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#82 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#83 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#84 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#85 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#86 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#87 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#88 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#89 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#90 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#91 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#92 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#93 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#94 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#95 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#96 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#97 0x3ffa2c8ab9b in PyVectorcall_Call Objects/call.c:267
ROCm#98 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#99 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#100 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#101 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#102 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#103 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#104 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#105 0x3ffa2c8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
ROCm#106 0x3ffa2c8b271 in _PyObject_Call_Prepend Objects/call.c:431
ROCm#107 0x3ffa2d3f307 in slot_tp_call Objects/typeobject.c:7494
ROCm#108 0x3ffa2c8a933 in _PyObject_MakeTpCall Objects/call.c:215
ROCm#109 0x3ffa2df0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
ROCm#110 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#111 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#112 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#113 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#114 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#115 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#116 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#117 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#118 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#119 0x3ffa2dff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
ROCm#120 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#121 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#122 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#123 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#124 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#125 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#126 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#127 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#128 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#129 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#130 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#131 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#132 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#133 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#134 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#135 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#136 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#137 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#138 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#139 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#140 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#141 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#142 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#143 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#144 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#145 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#146 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#147 0x3ffa2c8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
ROCm#148 0x3ffa2c8b271 in _PyObject_Call_Prepend Objects/call.c:431
ROCm#149 0x3ffa2d3f307 in slot_tp_call Objects/typeobject.c:7494
ROCm#150 0x3ffa2c8ad17 in _PyObject_Call Objects/call.c:305
ROCm#151 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#152 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#153 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#154 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#155 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#156 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#157 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#158 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#159 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#160 0x3ffa2dff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
ROCm#161 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#162 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#163 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#164 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#165 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#166 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#167 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#168 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#169 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#170 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#171 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#172 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#173 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#174 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#175 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#176 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#177 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#178 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#179 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#180 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#181 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#182 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#183 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#184 0x3ffa2dff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
ROCm#185 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#186 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#187 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#188 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#189 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#190 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#191 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#192 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#193 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#194 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#195 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#196 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#197 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#198 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#199 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#200 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#201 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#202 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#203 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#204 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#205 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#206 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#207 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#208 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#209 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#210 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#211 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#212 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#213 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#214 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#215 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#216 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#217 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#218 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#219 0x3ffa2c8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
ROCm#220 0x3ffa2c8b271 in _PyObject_Call_Prepend Objects/call.c:431
ROCm#221 0x3ffa2d3f307 in slot_tp_call Objects/typeobject.c:7494
ROCm#222 0x3ffa2c8a933 in _PyObject_MakeTpCall Objects/call.c:215
ROCm#223 0x3ffa2df0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
ROCm#224 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#225 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#226 0x3ffa2dffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
ROCm#227 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#228 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#229 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#230 0x3ffa2c8ab15 in PyVectorcall_Call Objects/call.c:255
ROCm#231 0x3ffa2c8ac65 in _PyObject_Call Objects/call.c:290
ROCm#232 0x3ffa2c8ada9 in PyObject_Call Objects/call.c:317
ROCm#233 0x3ffa2e059c7 in do_call_core Python/ceval.c:5943
ROCm#234 0x3ffa2dffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
ROCm#235 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#236 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#237 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#238 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#239 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#240 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#241 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#242 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#243 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#244 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#245 0x3ffa2c8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#246 0x3ffa2c8eddd in method_vectorcall Objects/classobject.c:53
ROCm#247 0x3ffa2df00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
ROCm#248 0x3ffa2df013d in PyObject_Vectorcall Include/cpython/abstract.h:123
ROCm#249 0x3ffa2e05447 in call_function Python/ceval.c:5891
ROCm#250 0x3ffa2dff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
ROCm#251 0x3ffa2df052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
ROCm#252 0x3ffa2e02b67 in _PyEval_Vector Python/ceval.c:5065
ROCm#253 0x3ffa2c8aec1 in _PyFunction_Vectorcall Objects/call.c:342
ROCm#254 0x3ffa2c8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
ROCm#255 0x3ffa2c8b271 in _PyObject_Call_Prepend Objects/call.c:431
ROCm#256 0x3ffa2d3f307 in slot_tp_call Objects/typeobject.c:7494
ROCm#257 0x3ffa2c8a933 in _PyObject_MakeTpCall Objects/call.c:215
0x61000013d790 is located 80 bytes inside of 192-byte region [0x61000013d740,0x61000013d800)
freed by thread T0 here:
#0 0x3ffa3237de5 in operator delete(void*) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160
ROCm#1 0x3ff8e7e3221 in c10::TensorImpl::~TensorImpl() /home/user/pytorch/c10/core/TensorImpl.cpp:75
previously allocated by thread T0 here:
#0 0x3ffa323734f in operator new(unsigned long) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:99
ROCm#1 0x3ff4aeeb3d1 in c10::intrusive_ptr<c10::TensorImpl, c10::detail::intrusive_target_default_null_type<c10::TensorImpl> > c10::intrusive_ptr<c10::TensorImpl, c10::detail::intrusive_target_default_nul
l_type<c10::TensorImpl> >::make<c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKeySet&, caffe2::TypeMeta&>(c10::intrusive_ptr<c10::S
torageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >&&, c10::DispatchKeySet&, caffe2::TypeMeta&) /home/user/pytorch/c10/util/intrusive_ptr.h:498
ROCm#2 0x3ff76f79e17 (/home/user/pytorch/build/lib.linux-s390x-cpython-310/torch/lib/libtorch_cpu.so+0x2fb79e17)
SUMMARY: AddressSanitizer: heap-use-after-free /home/user/pytorch/c10/core/SymInt.h:154 in c10::SymInt::is_heap_allocated() const
Shadow bytes around the buggy address:
0x100c2000027aa0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x100c2000027ab0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x100c2000027ac0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x100c2000027ad0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x100c2000027ae0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x100c2000027af0: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
0x100c2000027b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x100c2000027b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100c2000027b20: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x100c2000027b30: 00 00 00 00 04 fa fa fa fa fa fa fa fa fa fa fa
0x100c2000027b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1115867==ABORTING
```
</details>
<details>
<summary>Additional backtraces (not full)</summary>
Memory deallocation:
```
#0 operator delete (ptr=0x61000013d740) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160
ROCm#1 0x000003ffa77e3222 in c10::TensorImpl::~TensorImpl (this=0x61000013d740) at /home/user/pytorch/c10/core/TensorImpl.cpp:75
ROCm#2 0x000003ff63e76e8c in c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_ (this=0x3ffd7ec8230) at /home/user/pytorch/c10/util/intrusive_ptr.h:291
ROCm#3 0x000003ff63e76910 in c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::~intrusive_ptr (this=0x3ffd7ec8230) at /home/user/pytorch/c10/util/intrusive_ptr.h:370
ROCm#4 0x000003ff63e67240 in at::TensorBase::~TensorBase (this=0x3ffd7ec8230) at /home/user/pytorch/aten/src/ATen/core/TensorBase.h:80
ROCm#5 0x000003ff63e85ee0 in at::Tensor::~Tensor (this=0x3ffd7ec8230) at aten/src/ATen/core/TensorBody.h:90
ROCm#6 0x000003ff63f67304 in resize__functionalization (dispatchKeySet=..., self=..., size=..., memory_format=...) at /home/user/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:173
ROCm#7 0x000003ff63f89258 in c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>), &(resize__functionalization(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>))>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat> > >::operator()(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>) (
this=0x6030000390a0, args=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13
ROCm#8 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>), &(resize__functionalization(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>))>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>) (functor=0x6030000390a0, dispatchKeySet=..., args=..., args=...,
args=...) at /home/user/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:480
ROCm#9 0x000003ff6aca560a in c10::callUnboxedKernelFunction<at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat> > (
unboxed_kernel_func=0x3ff63f88a80 <c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tenso
r const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>), &(resize__functionalization(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>))>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::optional<c10::MemoryFormat>)>, functor=0x6030000390a0,
dispatchKeySet=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
ROCm#10 0x000003ff6aca715c in c10::KernelFunction::call<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> > (this=0x6210005e1b28, opHandle=...,
dispatchKeySet=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:96
ROCm#11 c10::Dispatcher::redispatch<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const (
this=0x3ff919400e0 <c10::Dispatcher::realSingleton()::_singleton>, op=..., currentDispatchKeySet=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:656
ROCm#12 0x000003ff6a82006c in c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const (
this=0x3ff919a07e0 <at::_ops::resize_::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)::op>, currentDispatchKeySet=..., args=...,
args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:492
ROCm#13 at::_ops::resize_::redispatch (dispatchKeySet=..., self=..., size=..., memory_format=...) at /home/user/pytorch/build/aten/src/ATen/Operators_4.cpp:2144
ROCm#14 0x000003ff861d5e08 in at::redispatch::resize__symint (dispatchKeySet=..., self=..., size=..., memory_format=...) at aten/src/ATen/RedispatchFunctions.h:2847
ROCm#15 0x000003ff861b579e in torch::ADInplaceOrView::resize_ (ks=..., self=..., size=..., optional_memory_format=...) at /home/user/pytorch/torch/csrc/autograd/VariableTypeManual.cpp:401
```
Memory access:
```
#0 c10::SymInt::maybe_as_int (this=0x61000013d790) at /home/user/pytorch/c10/core/SymInt.h:215
ROCm#1 0x000003ff734d0a6e in c10::SymInt::sym_eq (this=0x61000013d790, sci=...) at /home/user/pytorch/c10/core/SymInt.cpp:69
ROCm#2 0x000003ff5f6ab0be in c10::SymInt::operator== (this=0x61000013d790, o=...) at /home/user/pytorch/c10/core/SymInt.h:177
ROCm#3 0x000003ff5f6aaede in std::__equal<false>::equal<c10::SymInt const*, c10::SymInt const*> (__first1=0x61000013d790, __last1=0x61000013d7a0, __first2=0x602000015c30)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_algobase.h:1162
ROCm#4 0x000003ff5f6aae4c in std::__equal_aux1<c10::SymInt const*, c10::SymInt const*> (__first1=0x61000013d790, __last1=0x61000013d7a0, __first2=0x602000015c30)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_algobase.h:1211
ROCm#5 0x000003ff5f6aae06 in std::__equal_aux<c10::SymInt const*, c10::SymInt const*> (__first1=0x61000013d790, __last1=0x61000013d7a0, __first2=0x602000015c30)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_algobase.h:1219
ROCm#6 0x000003ff5f6aad98 in std::equal<c10::SymInt const*, c10::SymInt const*> (__first1=0x61000013d790, __last1=0x61000013d7a0, __first2=0x602000015c30)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_algobase.h:1556
ROCm#7 0x000003ff2ff3c772 in c10::ArrayRef<c10::SymInt>::equals (this=0x3ffed7c9900, RHS=...) at /home/user/pytorch/c10/util/ArrayRef.h:188
ROCm#8 0x000003ff31891bc2 in c10::operator!=<c10::SymInt> (a1=..., a2=...) at /home/user/pytorch/c10/util/ArrayRef.h:341
ROCm#9 0x000003ff51eb5800 in torch::ADInplaceOrView::resize_ (ks=..., self=..., size=..., optional_memory_format=...) at /home/user/pytorch/torch/csrc/autograd/VariableTypeManual.cpp:408
ROCm#10 0x000003ff51ee59c8 in c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c
10::MemoryFormat>), &torch::ADInplaceOrView::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>
> >::operator()(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) (this=0x6030007dca40, args=..., args=..., args=..., args=...)
at /home/user/pytorch/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13
ROCm#11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt
>, c10::optional<c10::MemoryFormat>), &torch::ADInplaceOrView::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<
c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tenso
r const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) (functor=0x6030007dca40, dispatchKeySet=..., args=..., args=..., args=...)
at /home/user/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:480
ROCm#12 0x000003ff369a512a in c10::callUnboxedKernelFunction<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> > (
unboxed_kernel_func=0x3ff51ee51f0 <c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor const& (c10::DispatchKeySet, at::Tenso
r const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>), &torch::ADInplaceOrView::resize_>, at::Tensor const&, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::Ar
rayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> > >, at::Tensor const& (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKern
el*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>, functor=0x6030007dca40, dispatchKeySet=..., args=..., args=..., args=...)
at /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
ROCm#13 0x000003ff369a6e90 in c10::KernelFunction::call<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> > (this=0x6210005e1bc8, opHandle=...,
dispatchKeySet=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:90
ROCm#14 c10::Dispatcher::redispatch<at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat> >(c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::Arr
ayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const (
this=0x3ff5d6400e0 <c10::Dispatcher::realSingleton()::_singleton>, op=..., currentDispatchKeySet=..., args=..., args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:656
ROCm#15 0x000003ff3652006c in c10::TypedOperatorHandle<at::Tensor const& (at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)>::redispatch(c10::DispatchKeySet, at::Tensor const&,
c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>) const (
this=0x3ff5d6a07e0 <at::_ops::resize_::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::MemoryFormat>)::op>, currentDispatchKeySet=..., args=...,
args=..., args=...) at /home/user/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:492
ROCm#16 at::_ops::resize_::redispatch (dispatchKeySet=..., self=..., size=..., memory_format=...) at /home/user/pytorch/build/aten/src/ATen/Operators_4.cpp:2144
ROCm#17 0x000003ff51ed5e08 in at::redispatch::resize__symint (dispatchKeySet=..., self=..., size=..., memory_format=...) at aten/src/ATen/RedispatchFunctions.h:2847
ROCm#18 0x000003ff51ebbb68 in torch::autograd::VariableType::(anonymous namespace)::resize_ (ks=..., self=..., size=..., optional_memory_format=...)
at /home/user/pytorch/torch/csrc/autograd/VariableTypeManual.cpp:243
```
</details>
Pull Request resolved: pytorch#101064
Approved by: https://github.com/Skylion007, https://github.com/albanD
akashveramd
pushed a commit
that referenced
this pull request
Jun 13, 2025
This PR adds support for Llama3 8b/70b, mainly it: - add tiktonizer, add instructions to download tokenizer - add options for the llama model to support Llama3 - add Llama3 8b/70b configs
amd-sriram
added a commit
that referenced
this pull request
Aug 12, 2025
Commit Messages: - update the param_id calculation so that it works on both CPX and SPX modes (#271) (#272) - reset parameters for FusedDenseGeluDense similar to FusedDense to make the test_gelu pass (#269) (#270) Co-authored-by: Sriram Kumar <[email protected]> - Fix build error (#263) - Fix warp size (#256) * replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention - Disabling Aiter Installation in default build (#254) * made a flag to switch on/off aiter compile using --aiter when installing apex * Added information on building AITER during installation in readme - Replaced warpsize with C10_WARP_SIZE (#249) - correct the approach to get to the apex folder from the test file (#248) - Apex extensions import test (#245) * add test to extract extensions from setup.py and test if there can be imported * moved test outside tests/L0 - Fixing the C10_warpsize issue. replacing the macros with at::cuda::warp_size() (#237) - Replacing c10_warp_size with platform based warp_size values (#228) fixes :https://ontrack-internal.amd.com/browse/SWDEV-541725 - [master] Added AITER as a submodule and use in fused_rope.py (#222) * Added aiter support in fused_rope.py for all 4 variants. Updated fused rope test, reduced tolerances according to unit test in aiter repo. * Add aiter as a submodule and install it if it is rocm. Switch on aiter backend if it is rocm and aiter is installed * add pandas to the requirements so that aiter can be used without numpy error - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject * Replace ROCM_HOME condition to IS_ROCM_PYTORCH for installing aiter and use pip install -e . instead of python setup.py develop for installing aiter. * Create apex and aiter subclasses for the four variants of FusedRoPEFunc and select apex or aiter subclass based on AITER_ROPE_BACKEND value. The user can specify the environment variable USE_ROCM_AITER_ROPE_BACKEND to select between aiter and apex backends for fused rope. * If the AITER backend is selected, use lowered precision in the unit test otherwise use the original precision 1e-3 * warn user about the lower precision when using aiter backend for fused rope * Update fused_rope.py remove spaces * simplify the switch between aiter and apex subclasses * install aiter without editable mode - Merge pull request #227 from ROCm/amd/dev/iassiour/SWDEV-541770 Do not use warpSize as a constexpr in nhwc_batch_norm_kernel.h - Do not use warpSize as a constexpr in nhwc_batch_norm_kernel.h In ROCm 7.0, the warpSize variable is no longer constexpr. This commit replaces the variable use with the correct values based on the architecture we're running on. - change epilogue parameter for hipblaslt matmul in cuda kernel for fused dense gelu dense (#223) Fixes : https://ontrack-internal.amd.com/browse/SWDEV-534531 - Reset torch default device to cpu after running the amp unit tests. (#220) - Fix unit tests for transformer, fused dense, mlp (#218) * Fix fused_dense_gelu_dense, change the names of the parameters so that they can be accessed by the test appropriately * Update the absolute tolerances in test_mlp from 0 and 1e-7 to 1e-5 * Deactivate the amp state handle for optimization level other than O0. This helps to pass the UT after this. * Update condition for deactivating amp state handle from opt level equal to 1 to opt level not equal to 0 * Update torch set default dtype method to remove warning * Update the method to create overflow buffer for amp optimizer * Update the method to create overflow buffer for amp optimizer * Update the method to create overflow buffer for amp optimizer * reset the default device to cpu so that the generator uses cuda, as run_amp tests set its to cuda - Update fused layer norm code from upstream apex repo. The intra-warp reductions code inside cuWelfordMuSigma2() function in layer norm kernel assumes a warp size of 32, so added a condition for rocm to support gpu warp size (based on earlier apex code). For rocm, adjust the threadsize, based on earlier apex code. (#215) - upgrade matplotlib to resolve setuptools_scm error. (#213) The error: File /tmp/easy_install-_pfhn8pn/matplotlib-3.5.1/.eggs/setuptools_scm-8.3.1-py3.12.egg/setuptools_scm/_integration/pyproject_reading.py, line 36, in read_pyproject section = defn.get(tool, {})[tool_name] ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^ KeyError: 'setuptools_scm' Solution : https://github.com/matplotlib/matplotlib/blob/v3.8.x/pyproject.toml#L22 matplotlib 3.8 is the first version to have pyproject.toml with this tool.setuptools_scm section. This higher version of setuptools expects this structure in the python packages it installs. Matplotlib 3.5.1 doesn't satisfy this condition. The solution is to change the condition to matplotlib>=3.8. - Update distributed fused adam - integrate Pipeline operations and support different grad (#207) * Fix `DistributedFusedAdam` for grad dtype != param dtype (#1893) * Pipeline `reduce-scatter` and `all-reduce`. (#1895) --------- Co-authored-by: Tailing Yuan <[email protected]> Co-authored-by: Wil Kong <[email protected]> - Update the condition for building the NCCL allocator, PyTorch should be greater than or equal to 2.6 (#204) - Update version.txt (#203) change the version from 1.7.0 to 1.8.0 - [ROCm] Use at::empty to manage workspace memory to avoid hip runtime calls (#197) Optimize the memory for fused_weight_gradient_mlp_cuda module - Update README.md (#198) Add release notes for release/1.5, 1.6 and 1.7 - Update README.md (#196) updated the support versions for apex 1.7.0 PRs: - https://github.com/ROCm/apex/pull/1895 Fixes: - https://example.com/issue-271 - https://example.com/issue-249 - https://example.com/issue-254 - https://example.com/issue-228 - https://example.com/issue-263 - https://example.com/issue-223 - https://example.com/issue-237 - https://example.com/issue-203 - https://example.com/issue-256 - https://example.com/issue-245 - https://example.com/issue-272 - https://example.com/issue-204 - https://ontrack-internal.amd.com/browse/SWDEV-540029 - https://example.com/issue-222 - https://example.com/issue-220 - https://example.com/issue-248 - https://example.com/issue-1893 - https://example.com/issue-198 - https://example.com/issue-215 - https://example.com/issue-213 - https://example.com/issue-1895 - https://example.com/issue-218 - https://example.com/issue-227 - https://example.com/issue-196 - https://example.com/issue-197 - https://example.com/issue-207
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While there, fix a copy-paste error.