Fix cuda sanitizer and as_subclass calls #138218

albanD · 2024-10-17T14:54:13Z

This fixes 4 main issues:

The way the cuda sanitizer handle it's state is weird. In particular, because the lifetime of the Mode is linked to the submodule, then this might outlive the python runtime and other modules loaded. On my current version, this even outlives the "sys" module. Given that I'm not sure the impact of changing this lifetime handling, I'm making the exit handler a no-op when python is already dying and thus no point cleaning up.
Adds a "disable" method to be able to test after the mode is enabled.
Fix Tensor.as_sublass() to properly disable modes when creating the new Tensor object just like we already do in make_subclass and make_wrapper_subclass. The change here is just to apply the exact same treatment to it.
~~Fix Tensor.as_subclass() not to propagate autograd as there is no valid backward associated here.~~ We have test that check that this behavior happen so I guess this is not an obvious bugfix and expected behavior. Reverted that change.

pytorch-bot · 2024-10-17T14:54:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138218

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3d6e927 with merge base 20af56d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/cuda/_sanitizer.py

janeyx99 · 2024-10-17T15:39:36Z

torch/cuda/_sanitizer.py

+        # different state of already cleaned up.
+        # Similarly other imports might already have been cleaned up so `sys` might
+        # be already gone as well.
+        # Skip exiting the mode if it outlived the runtime.


If we don't exit the mode, do we expect anything bad to happen? Or is it more like: the runtime is gone, sys is gone, exiting the mode is not a concern as everything is gone?

janeyx99 · 2024-10-17T15:43:05Z

test/test_cuda_sanitizer.py

+            csan.enable_cuda_sanitizer()
+            t = TwoTensor(torch.rand(2), torch.rand(2))
+
+            t = MyT(torch.rand(2))


I don't understand how this test case would have failed before. It looks like we expected a race for creating two different subclasses ?

No just two different kind of subclass should work.

Skylion007

Fix

torch/cuda/_sanitizer.py

albanD · 2024-10-17T17:56:54Z

@pytorchbot merge

pytorchmergebot · 2024-10-17T17:58:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix cuda sanitizer and as_subclass calls

dfbf16e

albanD added the release notes: python_frontend python frontend release notes category label Oct 17, 2024

albanD requested review from colesbury, ezyang and janeyx99 October 17, 2024 14:54

albanD requested review from eqy, soulitzer and syed-ahmed as code owners October 17, 2024 14:54

janeyx99 reviewed Oct 17, 2024

View reviewed changes

Skylion007 reviewed Oct 17, 2024

View reviewed changes

torch/cuda/_sanitizer.py Outdated Show resolved Hide resolved

PR comments

3d6e927

ngimel approved these changes Oct 17, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 17, 2024

pytorchmergebot added the merging label Oct 17, 2024

pytorchmergebot added the Merged label Oct 17, 2024

pytorchmergebot closed this in 69ba89d Oct 17, 2024

pytorchmergebot removed the merging label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix cuda sanitizer and as_subclass calls #138218

Fix cuda sanitizer and as_subclass calls #138218

Uh oh!

albanD commented Oct 17, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

janeyx99 Oct 17, 2024

Uh oh!

albanD Oct 17, 2024

Uh oh!

janeyx99 Oct 17, 2024

Uh oh!

albanD Oct 17, 2024

Uh oh!

Skylion007 left a comment

Uh oh!

Uh oh!

albanD commented Oct 17, 2024

Uh oh!

pytorchmergebot commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix cuda sanitizer and as_subclass calls #138218

Fix cuda sanitizer and as_subclass calls #138218

Uh oh!

Conversation

albanD commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138218

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

janeyx99 Oct 17, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Oct 17, 2024

Choose a reason for hiding this comment

Uh oh!

janeyx99 Oct 17, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Oct 17, 2024

Choose a reason for hiding this comment

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albanD commented Oct 17, 2024

Uh oh!

pytorchmergebot commented Oct 17, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

albanD commented Oct 17, 2024 •

edited

Loading

pytorch-bot bot commented Oct 17, 2024 •

edited

Loading