Skip to content

Conversation

@albanD
Copy link
Collaborator

@albanD albanD commented Oct 17, 2024

This fixes 4 main issues:

  • The way the cuda sanitizer handle it's state is weird. In particular, because the lifetime of the Mode is linked to the submodule, then this might outlive the python runtime and other modules loaded. On my current version, this even outlives the "sys" module. Given that I'm not sure the impact of changing this lifetime handling, I'm making the exit handler a no-op when python is already dying and thus no point cleaning up.
  • Adds a "disable" method to be able to test after the mode is enabled.
  • Fix Tensor.as_sublass() to properly disable modes when creating the new Tensor object just like we already do in make_subclass and make_wrapper_subclass. The change here is just to apply the exact same treatment to it.
  • Fix Tensor.as_subclass() not to propagate autograd as there is no valid backward associated here. We have test that check that this behavior happen so I guess this is not an obvious bugfix and expected behavior. Reverted that change.

@albanD albanD added the release notes: python_frontend python frontend release notes category label Oct 17, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138218

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3d6e927 with merge base 20af56d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

# different state of already cleaned up.
# Similarly other imports might already have been cleaned up so `sys` might
# be already gone as well.
# Skip exiting the mode if it outlived the runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't exit the mode, do we expect anything bad to happen? Or is it more like: the runtime is gone, sys is gone, exiting the mode is not a concern as everything is gone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly !

csan.enable_cuda_sanitizer()
t = TwoTensor(torch.rand(2), torch.rand(2))

t = MyT(torch.rand(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this test case would have failed before. It looks like we expected a race for creating two different subclasses ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No just two different kind of subclass should work.

Copy link
Collaborator

@Skylion007 Skylion007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

@albanD
Copy link
Collaborator Author

albanD commented Oct 17, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 17, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: python_frontend python frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants