Skip to content

Conversation

@cyyever
Copy link
Collaborator

@cyyever cyyever commented Jun 28, 2023

Fixes ASAN stack-use-after-scope in MKLDNN.
The stack trace is

2023-06-27T16:37:20.9099950Z ==1424==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f0c5dc20980 at pc 0x7f0c61286a73 bp 0x7ffef8e76990 sp 0x7ffef8e76118
2023-06-27T16:37:20.9100054Z READ of size 24 at 0x7f0c5dc20980 thread T0
2023-06-27T16:37:20.9100327Z     #0 0x7f0c61286a72 in memcmp (/usr/lib/llvm-7/lib/clang/7.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x5da72)
2023-06-27T16:37:20.9100701Z     #1 0x7f0c2f395d0b in c10::ArrayRef<long>::equals(c10::ArrayRef<long>) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb8bd0b)
2023-06-27T16:37:20.9101196Z     #2 0x7f0c314a1bb1 in at::native::mkldnn_matmul(at::Tensor const&, at::Tensor const&, at::Tensor const&, float, float) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xec97bb1)
2023-06-27T16:37:20.9101714Z     #3 0x7f0c301f49c5 in at::native::bmm_out_or_baddbmm_(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ea9c5)
2023-06-27T16:37:20.9102153Z     #4 0x7f0c301f85ab in at::native::structured_bmm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ee5ab)
2023-06-27T16:37:20.9102601Z     #5 0x7f0c32cb3cb6 in at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x104a9cb6)
2023-06-27T16:37:20.9103662Z     #6 0x7f0c32ea1f43 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x10697f43)
2023-06-27T16:37:20.9104330Z     #7 0x7f0c3187252a in at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xf06852a)
2023-06-27T16:37:20.9104756Z     #8 0x7f0c3257e097 in at::_ops::bmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd74097)
2023-06-27T16:37:20.9105237Z     #9 0x7f0c383c31c3 in torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb91c3)
2023-06-27T16:37:20.9106496Z     #10 0x7f0c383c25b9 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb85b9)
2023-06-27T16:37:20.9106874Z     #11 0x7f0c3257da60 in at::_ops::bmm::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd73a60)
2023-06-27T16:37:20.9107275Z     #12 0x7f0c301fc0e2 in at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9f20e2)
2023-06-27T16:37:20.9107647Z     #13 0x7f0c301f9c21 in at::native::matmul(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9efc21)
2023-06-27T16:37:20.9108853Z     #14 0x7f0c33dca7e3 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__matmul(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x115c07e3)
2023-06-27T16:37:20.9109255Z     #15 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0)
2023-06-27T16:37:20.9110023Z     #16 0x7f0c2f596b62 in at::autocast::WrapFunction_<(at::autocast::CastPolicy)0, (c10::DeviceType)0, at::Tensor (at::Tensor const&, at::Tensor const&), &(at::_ops::matmul::call(at::Tensor const&, at::Tensor const&)), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcd8cb62)
2023-06-27T16:37:20.9110723Z     #17 0x7f0c2f348403 in c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::operator()(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e403)
2023-06-27T16:37:20.9111596Z     #18 0x7f0c2f348063 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e063)
2023-06-27T16:37:20.9111976Z     #19 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0)
2023-06-27T16:37:20.9112383Z     #20 0x7f0c5803dc3e in torch::autograd::THPVariable_matmul(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x2b2cc3e)
2023-06-27T16:37:20.9112561Z warning: parsing line table prologue at 0x00000000 should have ended at 0x0000050b but it ended at 0x0000050a
2023-06-27T16:37:20.9112713Z     #21 0x5074a6 in cfunction_call (/opt/conda/envs/py_3.9/bin/python3.9+0x5074a6)
2023-06-27T16:37:20.9112857Z     #22 0x505997 in _PyObject_Call (/opt/conda/envs/py_3.9/bin/python3.9+0x505997)
2023-06-27T16:37:20.9113114Z     #23 0x505997 in PyObject_Call /croot/python-split_1684193875530/work/build-static/<invalid>:293:12
2023-06-27T16:37:20.9113258Z     #24 0x4ed302 in do_call_core (/opt/conda/envs/py_3.9/bin/python3.9+0x4ed302)
2023-06-27T16:37:20.9113633Z     #25 0x4ed302 in _PyEval_EvalFrameDefault /croot/python-split_1684193875530/work/build-static/<invalid>:3582:22
2023-06-27T16:37:20.9113780Z     #26 0x4e6729 in _PyEval_EvalFrame (/opt/conda/envs/py_3.9/bin/python3.9+0x4e6729)
2023-06-27T16:37:20.9114041Z     #27 0x4e6729 in _PyEval_EvalCode /croot/python-split_1684193875530/work/build-static/<invalid>:4329:14
2023-06-27T16:37:20.9114202Z     #28 0x4efd7d in _PyFunction_Vectorcall (/opt/conda/envs/py_3.9/bin/python3.9+0x4efd7d)

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104331

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit c3439d8:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cyyever cyyever marked this pull request as draft June 28, 2023 07:10
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jun 28, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 28, 2023

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 28, 2023
@cyyever cyyever marked this pull request as ready for review June 28, 2023 12:23
@cyyever cyyever changed the title fix ASAN errors in MKLDNN fix an ASAN error in MKLDNN Jun 28, 2023
Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this fixes the ASAN error?

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 28, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@soulitzer contiguous_strides returns a DimVector, then the old function call passed it to a IntArrayRef result, which became a dangling reference. And in fact, this commit was picked from PR #104224. The error disappeared after the change.

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 29, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@cyyever cyyever deleted the asan_mkldnn_fix branch July 26, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants