-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix masked tensor test_stack memory leak #137815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137815
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 11e67f5 with merge base 7c1d939 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
The warning for the requires_grad maybe unrelated and can be easily fixed by removing |
|
@cpuhrsch I see that you are the reviewer of #125262 (comment) and would appreciate if you have any insights to share here |
|
@huydhn - I'm not sure why this is happening. If the fix in this PR resolves the leak in the test I suggest we land it. |
|
@pytorchbot merge -f 'All slow tests have passed' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hi, Thank you for the fix! If requires_grad caused the memory leak the in the test, I can look into it and manually require the grad on the masked tensor. @cpuhrsch I did not consider it but I was using masked tensors just for parts of my code so gradients were passing through the masked part. Do you know where the leak may come from? Otherwise we could manualy delete the tensors in edit: never mind, as per #137890, del may raise other issues |
|
Yeah, #137890 has the proper fix. The issue has always been there I guess. |
Note that this reverts the change from #137815 as well which is not needed anymore! Without this, you create an unbeakable reference cycle. It is unbreakable because part of the cycle is through the autograd graph which we cannot traverse. Pull Request resolved: #137890 Approved by: https://github.com/atalman, https://github.com/huydhn, https://github.com/Skylion007
This test is currently failing in trunk when memory leak check is enabled, for example https://github.com/pytorch/pytorch/actions/runs/11296206361/job/31422348823#step:22:1970. When testing locally, calling
backwardon a masked tensor always causes memory leak until I clean up the data and the mask manually. This is probably related to this warning from masked tensorUserWarning: It is not recommended to create a MaskedTensor with a tensor that requires_grad. To avoid this, you can use data.clone().detach(), but I don't know much about the internal details here to go further. So, let's just fix the test first/Testing
passes and doesn't warn about memory leak anymore.
The test itself came from #125262 (comment)