Fix wrapper subclass serialization with custom sizes / strides #137030

jbschlosser · 2024-09-30T19:01:48Z

Stack from ghstack (oldest at bottom):

This PR takes the strategy outlined in the above issue and clears out any cached sizes / strides PyCapsules before serialization. This affects the default subclass serialization logic.

The PyCapsule issue also affects deepcopy, so that's fixed here as well.

Note: I originally tried utilizing a context manager to remove / restore cached PyCapsules after serialization, but in practice the state returned from _reduce_ex_internal() references the actual tensor.__dict__(), so the problem persists once the cached values are restored. Instead, we have to be careful to remove the cached values in the right place so they're not re-cached when pulling out size / stride information for serialization.

[ghstack-poisoned]

pytorch-bot · 2024-09-30T19:01:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137030

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bdd1d61 with merge base 0ccd39a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ides" Fixes #130154 This PR takes the strategy outlined in the above issue and clears out any cached sizes / strides PyCapsules before serialization. This affects the default subclass serialization logic. The PyCapsule issue also affects `deepcopy`, so that's fixed here as well. Note: I originally tried utilizing a context manager to remove / restore cached PyCapsules after serialization, but in practice the state returned from `_reduce_ex_internal()` references the actual `tensor.__dict__()`, so the problem persists once the cached values are restored. Instead, we have to be careful to remove the cached values in the right place so they're not re-cached when pulling out size / stride information for serialization. [ghstack-poisoned]

torch/testing/_internal/common_subclass.py

torch/_tensor.py

…ides" Fixes #130154 This PR takes the strategy outlined in the above issue and clears out any cached sizes / strides PyCapsules before serialization. This affects the default subclass serialization logic. The PyCapsule issue also affects `deepcopy`, so that's fixed here as well. Note: I originally tried utilizing a context manager to remove / restore cached PyCapsules after serialization, but in practice the state returned from `_reduce_ex_internal()` references the actual `tensor.__dict__()`, so the problem persists once the cached values are restored. Instead, we have to be careful to remove the cached values in the right place so they're not re-cached when pulling out size / stride information for serialization. [ghstack-poisoned]

torch/_tensor.py

…ides" Fixes #130154 This PR takes the strategy outlined in the above issue and clears out any cached sizes / strides PyCapsules before serialization. This affects the default subclass serialization logic. The PyCapsule issue also affects `deepcopy`, so that's fixed here as well. Note: I originally tried utilizing a context manager to remove / restore cached PyCapsules after serialization, but in practice the state returned from `_reduce_ex_internal()` references the actual `tensor.__dict__()`, so the problem persists once the cached values are restored. Instead, we have to be careful to remove the cached values in the right place so they're not re-cached when pulling out size / stride information for serialization. [ghstack-poisoned]

albanD

SGTM !

jbschlosser · 2024-10-02T16:04:05Z

@pytorchbot merge

pytorchmergebot · 2024-10-02T16:05:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #129366 Since NJT has custom serialization logic, we need an NJT-specific fix to clear out cached sizes / strides PyCapsules. Eventually, we should switch NJT to use the default serialization logic, but this depends on #125622 being addressed. This PR also makes serialization more complete by explicitly handling `lengths`, `ragged_idx`, and the `metadata_cache`, ensuring working operation for both contiguous and non-contiguous NJTs, Pull Request resolved: #137031 Approved by: https://github.com/soulitzer ghstack dependencies: #137030

Called out via torchrec integration: `lengths` is not handled properly. Future work (not related to non-contiguous NJTs): #137275 Pull Request resolved: #137124 Approved by: https://github.com/soulitzer ghstack dependencies: #137030, #137031

Fix wrapper subclass serialization with custom sizes / strides

03c1cb0

[ghstack-poisoned]

jbschlosser mentioned this pull request Sep 30, 2024

Fix NJT serialization #137031

Closed

jbschlosser requested review from albanD, bdhirsh and mikaylagawarecki September 30, 2024 19:05

jbschlosser added the topic: not user facing topic category label Sep 30, 2024

albanD reviewed Oct 1, 2024

View reviewed changes

torch/testing/_internal/common_subclass.py Outdated Show resolved Hide resolved

torch/testing/_internal/common_subclass.py Outdated Show resolved Hide resolved

torch/_tensor.py Outdated Show resolved Hide resolved

jbschlosser mentioned this pull request Oct 1, 2024

Support record_stream() for NJT #137099

Closed

jbschlosser requested a review from albanD October 1, 2024 16:13

This was referenced Oct 1, 2024

Fix to() on non-contiguous NJTs #137124

Closed

Allow any single non-batch dim to be ragged for NJT #137125

Closed

Introduce non-contiguous NJT OpInfo SampleInputs #137126

Closed

albanD reviewed Oct 1, 2024

View reviewed changes

torch/_tensor.py Outdated Show resolved Hide resolved

torch/_tensor.py Outdated Show resolved Hide resolved

mikaylagawarecki previously approved these changes Oct 1, 2024

View reviewed changes

albanD approved these changes Oct 1, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 2, 2024

pytorchmergebot added the merging label Oct 2, 2024

pytorchmergebot added the Merged label Oct 2, 2024

pytorchmergebot closed this in 6374a19 Oct 2, 2024

pytorchmergebot removed the merging label Oct 2, 2024

github-actions bot deleted the gh/jbschlosser/182/head branch November 6, 2024 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix wrapper subclass serialization with custom sizes / strides #137030

Fix wrapper subclass serialization with custom sizes / strides #137030

Uh oh!

jbschlosser commented Sep 30, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Uh oh!

jbschlosser commented Oct 2, 2024

Uh oh!

pytorchmergebot commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix wrapper subclass serialization with custom sizes / strides #137030

Fix wrapper subclass serialization with custom sizes / strides #137030

Uh oh!

Conversation

jbschlosser commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137030

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

jbschlosser commented Oct 2, 2024

Uh oh!

pytorchmergebot commented Oct 2, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jbschlosser commented Sep 30, 2024 •

edited

Loading

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading