Skip to content

Conversation

@albanD
Copy link
Collaborator

@albanD albanD commented Aug 1, 2023

Fix issues with new asserts introduced in 3.12 and pybind gil holding check on destructor.
See pybind/pybind11#4769 for details on why this is a preferred solution rather than skipping the decref in all pybind object destructors.

@albanD albanD requested a review from ezyang August 1, 2023 20:03
@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Aug 1, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 1, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106401

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 017bfbc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from the pybind11_abseil code I pointed you to, in two ways:

py::exception overloads operator() as I gather you found, and you're pasting the implementation here. I think what you have here should work, but I see you wrote (pybind/pybind11#4769 (comment)):

I don't seem to be able to get the release()-ed py::exception to be used at all.

Does this not work?

Maybe related, maybe not:

The pybind11_abseil code silently assumes that the interpreter is only initialized once in the process, and only finalized once (always true in the environments I usually work in). If the interpreter is restarted, I don't know how to prevent undefined behavior. Really what we need for that case is that the PyObject * behind exc here is reset in the interpreter shutdown.

But is that even your situation (multiple interpreter initialize/finalize cycles)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not work?

It does not, I tried any possible variant of py::reinterpret_borrow<py::exception<JITException>>(h) and py::cast<py::exception<JITException>>(h) but there isn't any constructor on the exception class that takes a handle as input. So that fails.
Maybe it would be ok to add such a constructor? I'm not sure why it is not inherited from the object class.

error: could not convert ‘{h, pybind11::object::borrowed_t()}’ from ‘<brace-enclosed initializer list>’ to ‘pybind11::exception<torch::jit::JITException>’

But is that even your situation (multiple interpreter initialize/finalize cycles)?

No multiple interpreter from the CPython sense. And this is a problem for me even without multipy (which we can ignore for now).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not, I tried any possible variant

I stared at this a intensely ...

https://github.com/pybind/pybind11/blob/f3e0602802c7840992c97f4960515777cad6a5c7/include/pybind11/pybind11.h#L2535-L2553

but I'm at a loss why your change doesn't work. py::exception inherits from object and doesn't add any members. All it has is m_ptr aka .ptr(). All we're doing with the .release() ...

https://github.com/pybind/pybind11/blob/f3e0602802c7840992c97f4960515777cad6a5c7/include/pybind11/pytypes.h#L368-L372

... is to reset m_ptr in object and returning it via a new handle.

So I created this pybind11 PR: pybind/pybind11#4772

It's basically the same change as you have here. It passes with all sanitizers I have access to.

In your situation I'd fall back to experimenting with (seemingly) silly things until I hit on the root cause.

For example, undo the change above (i.e. keep the static ... line as is).

But keep the change down here.

Does it still not work?

What's the error/symptom?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho well TIL: https://softwareengineering.stackexchange.com/questions/197893/why-are-constructors-not-inherited
So if we want to be able to do py::reinterpret_borrow<py::exception<JITException>>(h) then we must add the following to the exception class in pybind:

    exception(handle h, bool is_borrowed) : object(h, is_borrowed) {}
    exception(handle h, borrowed_t) : object(h, borrowed_t{}) {}
    exception(handle h, stolen_t) : object(h, stolen_t{}) {}

    // Or if you're C++11 +
    using object::object;

With this change (either explicit or using-based), I can locally change from PyErr_SetString(exc.ptr(), e.what()); to py::reinterpret_borrow<py::exception<JITException>>(exc)(e.what()); and everything works!

Would you be happy with adding these constructors to the pybind base exception class (or the using statement if pybind if C++11)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we want to be able to do py::reinterpret_borrow<py::exception>(h)

I don't think that's a good direction.

I had a run-in with the py::exception design before, I think it's really unfortunate to overload operator() for this purpose.

I think it's better to provide py::set_error(py::handle, const char*). That will work for any Python exception type. And deprecate the overloaded operator().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer this design, that sounds good to me.

I'll keep the PyErr_SetString here for now and I can migrate it to py::set_error when it is available in our version of pybind11 then!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with PyBind, but if object is allocated that way, what would prevent Python runtime from grabage collecting it?
I know this looks ugly, but if we want to keep the object referenced but never call it'd destructor perhaps something like static auto * exc = new py::exception<JITException>(m, "JITException"); will do better? (Or I think I have a dummy unique_pointer with leaky destrutor)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might actually be better!

(I guess I was too trapped in the handle idea which happens to work for our purposes just fine, I didn't really consider alternatives.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would prevent Python runtime from grabage collecting it

The reference count will never go to zero, the deallocation will never trigger.

Python doesn't keep a registry of all objects AFAIK, at least not in production configuration, therefore I think the Py_DECREF trigger is the only way there is to get to the deallocation code. Please correct me if I'm wrong.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like static auto * exc = new py::exception(m, "JITException"); will do better?

@malfet @albanD Could you please take a look here?

https://github.com/pybind/pybind11/pull/4772/files

(For future reference, at the point of this writing that was pybind/pybind11@9a4eba5)

I'm having 2nd thoughts, although it's 99.99% a style opinion. (0.01% that there is an extra dynamic memory allocation with the new alternative that is not actually needed.)

My thinking:

  • Using the py::handle ... .release(); alternative combined with py::set_error() is more concise / looks nicer IMO, seems very intuitive to me, compared to the surprising and evidently troublesome operator() overload.

  • In retrospect, we should have added py::set_error() a very long time ago. (There are 42 PyErr_SetString() across all files in the pybind11 repo, I've not replaced them all yet.)

Please let me know what you think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python doesn't keep a registry of all objects AFAIK

It does keep such a registry for all GC-able objects.
But release() leak one reference indeed. So this objects refcount will never be able to reach 0 and so will never be able to be de-allocated.

Could you please take a look here?

I agree that this is very much style. My personal vote is that avoiding the new is simpler in this context.
@malfet are you ok with me merging the current version of the PR? (I'll update to py::set_error() once we upgrade to a version of pybind that has that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me, build please add a comment saying that this object will never be GCed because of the reasons outlined above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me, build please add a comment saying that this object will never be GCed because of the reasons outlined above

@albanD
Copy link
Collaborator Author

albanD commented Aug 7, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

rwgk added a commit to pybind/pybind11 that referenced this pull request Aug 8, 2023
…#4772)

* Copy clang 17 compatibility fixes from PR #4762 to a separate PR.

* static py::exception<> -> static py::handle

* Add `py::set_error()` but also try the suggestion of @malfet (pytorch/pytorch#106401 (review)).

* clang 17 compatibility fixes (#4767)

* Copy clang 17 compatibility fixes from PR #4762 to a separate PR.

* Add gcc:13 C++20

* Add silkeh/clang:16-bullseye C++20

* chore(deps): update pre-commit hooks (#4770)

updates:
- [github.com/psf/black: 23.3.0 → 23.7.0](psf/black@23.3.0...23.7.0)
- [github.com/astral-sh/ruff-pre-commit: v0.0.276 → v0.0.281](astral-sh/ruff-pre-commit@v0.0.276...v0.0.281)
- [github.com/asottile/blacken-docs: 1.14.0 → 1.15.0](adamchainz/blacken-docs@1.14.0...1.15.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* docs: Remove upper bound on pybind11 in example pyproject.toml for setuptools (#4774)

* docs: Remove upper bound on pybind11 in example pyproject.toml for setuptools

* Update docs/compiling.rst

---------

Co-authored-by: Henry Schreiner <[email protected]>

* Provide better type hints for a variety of generic types (#4259)

* Provide better type hints for a variety of generic types

* Makes better documentation
* tuple, dict, list, set, function

* Move to py::typing

* style: pre-commit fixes

* Update copyright line with correct year and actual author. The author information was copy-pasted from the git log output.

---------

Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use `py::set_error()` everywhere possible (only one special case, in common.h).
Overload `py::set_error(py::handle, py::handle)`.
Change back to `static py::handle exc = ... .release();`
Deprecate `py::exception<>::operator()`

* Add `PYBIND11_WARNING_DISABLE` for INTEL and MSVC (and sort alphabetically).

* `PYBIND11_WARNING_DISABLE_INTEL(10441)` does not work.

For ICC only, falling back to the recommended `py::set_error()` to keep the testing simple.

It is troublesome to add `--diag-disable=10441` specifically for test_exceptions.cpp, even that is non-ideal because it covers the entire file, not just the one line we need it for, and the value of exercising the trivial deprecated `operator()` on this one extra platform is practically zero.

* Fix silly oversight.

* NVHPC 23.5.0 generates deprecation warnings. They are currently not treated as errors, but falling back to using `py::set_error()` to not have to deal with that distraction.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Keto D. Zhang <[email protected]>
Co-authored-by: Henry Schreiner <[email protected]>
Co-authored-by: Dustin Spicuzza <[email protected]>
Cyril-Anto pushed a commit to Cyril-Anto/pytorch that referenced this pull request Aug 17, 2023
Fix issues with new asserts introduced in 3.12 and pybind gil holding check on destructor.
See pybind/pybind11#4769 for details on why this is a preferred solution rather than skipping the decref in all pybind object destructors.
Pull Request resolved: pytorch#106401
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: jit release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants