-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Avoid weak_ptr::lock() when shared_ptr is provably live; more error checking. #22998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…hecking. In the original iteration of the patch, I used lock() everywhere to minimize the amount of code I have to modify. In this patch, I now eliminate as many lock()s as I can, when the caller is known to have a strong reference to the PyFunction, and pass that directly. Along the way, I also bulk up our error messages for checking the result of the weak pointer dereference. Some of these cases can be triggered by zany use of legacy autograd function API; might as well let people know what they've done wrong. Signed-off-by: Edward Z. Yang <[email protected]>
…hecking. In the original iteration of the patch, I used lock() everywhere to minimize the amount of code I have to modify. In this patch, I now eliminate as many lock()s as I can, when the caller is known to have a strong reference to the PyFunction, and pass that directly. Along the way, I also bulk up our error messages for checking the result of the weak pointer dereference. Some of these cases can be triggered by zany use of legacy autograd function API; might as well let people know what they've done wrong. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 66b1aa2 Pull Request resolved: #22998
…ore error checking." In the original iteration of the patch, I used lock() everywhere to minimize the amount of code I have to modify. In this patch, I now eliminate as many lock()s as I can, when the caller is known to have a strong reference to the PyFunction, and pass that directly. Along the way, I also bulk up our error messages for checking the result of the weak pointer dereference. Some of these cases can be triggered by zany use of legacy autograd function API; might as well let people know what they've done wrong. Signed-off-by: Edward Z. Yang <[email protected]>
…hecking. In the original iteration of the patch, I used lock() everywhere to minimize the amount of code I have to modify. In this patch, I now eliminate as many lock()s as I can, when the caller is known to have a strong reference to the PyFunction, and pass that directly. Along the way, I also bulk up our error messages for checking the result of the weak pointer dereference. Some of these cases can be triggered by zany use of legacy autograd function API; might as well let people know what they've done wrong. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 296f2e6 Pull Request resolved: #22998
|
|
||
| namespace pybind11 { namespace detail { | ||
|
|
||
| // handle Python <-> torch::autograd::Function conversions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure if this is used anywhere, but it doesn't seem to cause anything to fail when I delete it.
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accessors from Python need to properly handle the case where cdata is expired. For example, THPFunction_metadata.
Note that this isn't limited to "legacy" autograd functions. You can grab a grad_fn attribute off a variable and have it live longer than the variable and the rest of the autograd graph.
I find this difficult to review separately from #22983, since the earlier PR introduces undesirable behavior which is mostly fixed up here. I would find it easier to review if the two were combined.
OK. The best way I could think to do this is to move metadata to live on THPFunction rather than PyFunction. Does this sound reasonable to you? I'll try this change tomorrow.
I'm happy to squash. I'll do that tomorrow. |
That seems fine. I think it would also OK to raise an exception (but not an internal assertion), return an empty value, or return an empty value and warn. Some tests for the behavior would be good too. |
|
Oh, well, raising an error is a lot easier to do, imma do that first :) |
|
cc'ing @albanD as you may have a better idea what to do about anomaly metadata. |
|
OK, having done some testing, I feel a lot better about not "fixing" this properly. Take a look at this test program: On my branch, you get: The reason the first call is OK but the second is not is because The correct way to fix this is to make |
|
As requested by Sam, these PR has been squashed into #22983. |
Stack from ghstack:
In the original iteration of the patch, I used lock() everywhere to minimize
the amount of code I have to modify. In this patch, I now eliminate as many
lock()s as I can, when the caller is known to have a strong reference to the
PyFunction, and pass that directly.
Along the way, I also bulk up our error messages for checking the result
of the weak pointer dereference. Some of these cases can be triggered
by zany use of legacy autograd function API; might as well let people know
what they've done wrong.
Signed-off-by: Edward Z. Yang [email protected]