-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Fix lifetime of JITException binding #106401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106401
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 017bfbc: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/csrc/jit/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from the pybind11_abseil code I pointed you to, in two ways:
py::exception overloads operator() as I gather you found, and you're pasting the implementation here. I think what you have here should work, but I see you wrote (pybind/pybind11#4769 (comment)):
I don't seem to be able to get the release()-ed py::exception to be used at all.
Does this not work?
Maybe related, maybe not:
The pybind11_abseil code silently assumes that the interpreter is only initialized once in the process, and only finalized once (always true in the environments I usually work in). If the interpreter is restarted, I don't know how to prevent undefined behavior. Really what we need for that case is that the PyObject * behind exc here is reset in the interpreter shutdown.
But is that even your situation (multiple interpreter initialize/finalize cycles)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not work?
It does not, I tried any possible variant of py::reinterpret_borrow<py::exception<JITException>>(h) and py::cast<py::exception<JITException>>(h) but there isn't any constructor on the exception class that takes a handle as input. So that fails.
Maybe it would be ok to add such a constructor? I'm not sure why it is not inherited from the object class.
error: could not convert ‘{h, pybind11::object::borrowed_t()}’ from ‘<brace-enclosed initializer list>’ to ‘pybind11::exception<torch::jit::JITException>’
But is that even your situation (multiple interpreter initialize/finalize cycles)?
No multiple interpreter from the CPython sense. And this is a problem for me even without multipy (which we can ignore for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not, I tried any possible variant
I stared at this a intensely ...
but I'm at a loss why your change doesn't work. py::exception inherits from object and doesn't add any members. All it has is m_ptr aka .ptr(). All we're doing with the .release() ...
... is to reset m_ptr in object and returning it via a new handle.
So I created this pybind11 PR: pybind/pybind11#4772
It's basically the same change as you have here. It passes with all sanitizers I have access to.
In your situation I'd fall back to experimenting with (seemingly) silly things until I hit on the root cause.
For example, undo the change above (i.e. keep the static ... line as is).
But keep the change down here.
Does it still not work?
What's the error/symptom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho well TIL: https://softwareengineering.stackexchange.com/questions/197893/why-are-constructors-not-inherited
So if we want to be able to do py::reinterpret_borrow<py::exception<JITException>>(h) then we must add the following to the exception class in pybind:
exception(handle h, bool is_borrowed) : object(h, is_borrowed) {}
exception(handle h, borrowed_t) : object(h, borrowed_t{}) {}
exception(handle h, stolen_t) : object(h, stolen_t{}) {}
// Or if you're C++11 +
using object::object;With this change (either explicit or using-based), I can locally change from PyErr_SetString(exc.ptr(), e.what()); to py::reinterpret_borrow<py::exception<JITException>>(exc)(e.what()); and everything works!
Would you be happy with adding these constructors to the pybind base exception class (or the using statement if pybind if C++11)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we want to be able to do py::reinterpret_borrow<py::exception>(h)
I don't think that's a good direction.
I had a run-in with the py::exception design before, I think it's really unfortunate to overload operator() for this purpose.
I think it's better to provide py::set_error(py::handle, const char*). That will work for any Python exception type. And deprecate the overloaded operator().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you prefer this design, that sounds good to me.
I'll keep the PyErr_SetString here for now and I can migrate it to py::set_error when it is available in our version of pybind11 then!
torch/csrc/jit/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with PyBind, but if object is allocated that way, what would prevent Python runtime from grabage collecting it?
I know this looks ugly, but if we want to keep the object referenced but never call it'd destructor perhaps something like static auto * exc = new py::exception<JITException>(m, "JITException"); will do better? (Or I think I have a dummy unique_pointer with leaky destrutor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might actually be better!
(I guess I was too trapped in the handle idea which happens to work for our purposes just fine, I didn't really consider alternatives.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would prevent Python runtime from grabage collecting it
The reference count will never go to zero, the deallocation will never trigger.
Python doesn't keep a registry of all objects AFAIK, at least not in production configuration, therefore I think the Py_DECREF trigger is the only way there is to get to the deallocation code. Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like static auto * exc = new py::exception(m, "JITException"); will do better?
@malfet @albanD Could you please take a look here?
(For future reference, at the point of this writing that was pybind/pybind11@9a4eba5)
I'm having 2nd thoughts, although it's 99.99% a style opinion. (0.01% that there is an extra dynamic memory allocation with the new alternative that is not actually needed.)
My thinking:
-
Using the
py::handle ... .release();alternative combined withpy::set_error()is more concise / looks nicer IMO, seems very intuitive to me, compared to the surprising and evidently troublesomeoperator()overload. -
In retrospect, we should have added
py::set_error()a very long time ago. (There are 42PyErr_SetString()across all files in the pybind11 repo, I've not replaced them all yet.)
Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python doesn't keep a registry of all objects AFAIK
It does keep such a registry for all GC-able objects.
But release() leak one reference indeed. So this objects refcount will never be able to reach 0 and so will never be able to be de-allocated.
Could you please take a look here?
I agree that this is very much style. My personal vote is that avoiding the new is simpler in this context.
@malfet are you ok with me merging the current version of the PR? (I'll update to py::set_error() once we upgrade to a version of pybind that has that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me, build please add a comment saying that this object will never be GCed because of the reasons outlined above
torch/csrc/jit/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me, build please add a comment saying that this object will never be GCed because of the reasons outlined above
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#4772) * Copy clang 17 compatibility fixes from PR #4762 to a separate PR. * static py::exception<> -> static py::handle * Add `py::set_error()` but also try the suggestion of @malfet (pytorch/pytorch#106401 (review)). * clang 17 compatibility fixes (#4767) * Copy clang 17 compatibility fixes from PR #4762 to a separate PR. * Add gcc:13 C++20 * Add silkeh/clang:16-bullseye C++20 * chore(deps): update pre-commit hooks (#4770) updates: - [github.com/psf/black: 23.3.0 → 23.7.0](psf/black@23.3.0...23.7.0) - [github.com/astral-sh/ruff-pre-commit: v0.0.276 → v0.0.281](astral-sh/ruff-pre-commit@v0.0.276...v0.0.281) - [github.com/asottile/blacken-docs: 1.14.0 → 1.15.0](adamchainz/blacken-docs@1.14.0...1.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * docs: Remove upper bound on pybind11 in example pyproject.toml for setuptools (#4774) * docs: Remove upper bound on pybind11 in example pyproject.toml for setuptools * Update docs/compiling.rst --------- Co-authored-by: Henry Schreiner <[email protected]> * Provide better type hints for a variety of generic types (#4259) * Provide better type hints for a variety of generic types * Makes better documentation * tuple, dict, list, set, function * Move to py::typing * style: pre-commit fixes * Update copyright line with correct year and actual author. The author information was copy-pasted from the git log output. --------- Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use `py::set_error()` everywhere possible (only one special case, in common.h). Overload `py::set_error(py::handle, py::handle)`. Change back to `static py::handle exc = ... .release();` Deprecate `py::exception<>::operator()` * Add `PYBIND11_WARNING_DISABLE` for INTEL and MSVC (and sort alphabetically). * `PYBIND11_WARNING_DISABLE_INTEL(10441)` does not work. For ICC only, falling back to the recommended `py::set_error()` to keep the testing simple. It is troublesome to add `--diag-disable=10441` specifically for test_exceptions.cpp, even that is non-ideal because it covers the entire file, not just the one line we need it for, and the value of exercising the trivial deprecated `operator()` on this one extra platform is practically zero. * Fix silly oversight. * NVHPC 23.5.0 generates deprecation warnings. They are currently not treated as errors, but falling back to using `py::set_error()` to not have to deal with that distraction. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Keto D. Zhang <[email protected]> Co-authored-by: Henry Schreiner <[email protected]> Co-authored-by: Dustin Spicuzza <[email protected]>
Fix issues with new asserts introduced in 3.12 and pybind gil holding check on destructor. See pybind/pybind11#4769 for details on why this is a preferred solution rather than skipping the decref in all pybind object destructors. Pull Request resolved: pytorch#106401 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/Skylion007
Fix issues with new asserts introduced in 3.12 and pybind gil holding check on destructor.
See pybind/pybind11#4769 for details on why this is a preferred solution rather than skipping the decref in all pybind object destructors.