-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[ez] Disable some failing periodic tests #156731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pytorchbot merge -f "s390x failures not related" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda: Added in pytorch#150059 Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) started failing after moving to new cuda version pytorch#155234 I'll ping people if this gets merged Pull Request resolved: pytorch#156731 Approved by: https://github.com/huydhn
|
@pytorchbot cherry-pick --onto release/2.8 -c release |
test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda: Added in #150059 Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) started failing after moving to new cuda version #155234 I'll ping people if this gets merged Pull Request resolved: #156731 Approved by: https://github.com/huydhn (cherry picked from commit 2ff3280)
Cherry picking #156731The cherry pick PR is at #157560 The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
[ez] Disable some failing periodic tests (#156731) test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda: Added in #150059 Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/4491326fb0c0e67eca1598ae33c41cdfced2cd33) started failing after moving to new cuda version #155234 I'll ping people if this gets merged Pull Request resolved: #156731 Approved by: https://github.com/huydhn (cherry picked from commit 2ff3280) Co-authored-by: Catherine Lee <[email protected]>
# Motivation #155451 decoupled `torch._C._storage_Use_Count` from CUDA and introduced a corresponding unit test: https://github.com/pytorch/pytorch/blob/815545f2dd6ade563cb1263f8bb7813f355edb2e/test/test_torch.py#L257-L262 However, this test fails when PyTorch is built with debug assertions enabled. @clee2000 disabled this UT in #156731. The root cause is that `_cdata` is obtained from an `intrusive_ptr`, not a `weak_intrusive_ptr`. As a result, calling `c10::weak_intrusive_ptr::use_count` on it triggers the internal assertion: https://github.com/pytorch/pytorch/blob/815545f2dd6ade563cb1263f8bb7813f355edb2e/c10/util/intrusive_ptr.h#L912-L917 For example: ```python a = torch.randn(10, device=device) # refcount=1, weakcount=1 prev_cf = torch._C._storage_Use_Count(a.untyped_storage()._cdata) # violate the assertation ``` This violates the expected invariant inside `weak_intrusive_ptr::use_count`, which assumes the pointer was originally constructed from a valid `weak_intrusive_ptr`. Actually, `storage_impl` is obtained from an `intrusive_ptr`. https://github.com/pytorch/pytorch/blob/815545f2dd6ade563cb1263f8bb7813f355edb2e/torch/csrc/Module.cpp#L2105-L2109 # Solution Use `c10::intrusive_ptr::use_count` instead. Pull Request resolved: #157694 Approved by: https://github.com/albanD
test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda:
Added in #155451
Fails in debug mode GH job link HUD commit link
inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda:
GH job link HUD commit link
started failing after moving to new cuda version #155234
I'll ping people if this gets merged
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov