`__traverse__` should not alter any refcounts, at least in CPython

## Bug Description

Back when we've got a support for the [GC protocol](https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse), while it is not usual for `__traverse__` to do anything more than calling `visit.call`, it was assumed that [running any Python code is not harmful](https://github.com/PyO3/pyo3/pull/2159#issuecomment-1037002492) and currently `__traverse__` trampoline is no more than an ordinary trampoline with a slightly different signature.

It turns out that not only this assumption is wrong but also *any* currently generated `tp_traverse` implementation is prone to crash. CPython has [reused `PyGC_Head` fields to store intermediate refcounts](https://devguide.python.org/internals/garbage-collector/#optimization-reusing-fields-to-save-memory) for a long time, so `tp_traverse` should be never able to read or write any `PyGC_Head` fields.

In particular the `__traverse__` trampoline would first allocate a new `GILPool` and apply deferred refcount updates at once, possibly deallocating some objects in the pool and in turn somehow touching temporarily corrupted `PyGC_Head` (e.g. `PyObject_GC_Untrack` or `PyObject_GC_Del`). For this reason I believe this is the root cause of #1623 and #3064; both suggests that `PyGC_Head` is corrupted and threading seems to be relevant here (it is much easier to defer refcount updates in non-interpreter threads).

## Reproduction

<details><summary>Too lengthy to open by default</summary>

I was able to reliably trigger SIGSEGV on Python 3.8 through 3.11 in x86-64 Linux. Given the following `src/lib.rs` and typical `Cargo.toml` (which has been omitted):

```rust
use pyo3::{prelude::*, PyTraverseError, PyVisit};
use std::{cell::RefCell, thread};

#[pyclass]
struct Ref {
    inner: RefCell<Option<Py<Self>>>,
    clear_on_traverse: bool,
}

impl Ref {
    fn new(clear_on_traverse: bool) -> Self {
        Self {
            inner: RefCell::new(None),
            clear_on_traverse,
        }
    }
}

#[pymethods]
impl Ref {
    fn __traverse__(&self, visit: PyVisit<'_>) -> Result<(), PyTraverseError> {
        if self.clear_on_traverse {
            self.inner.take();
        } else {
            if let Some(inner) = &*self.inner.borrow() {
                visit.call(inner)?;
            }
        }
        Ok(())
    }

    fn __clear__(&mut self) {
        self.inner.take();
    }
}

#[pyfunction]
fn crash1() {
    thread::spawn(move || {
        _ = Python::with_gil(|py| -> PyResult<_> {
            let a = Py::new(py, Ref::new(false))?;
            a.borrow_mut(py).inner.replace(Some(a.clone_ref(py)));

            let b = Py::new(py, Ref::new(false)))?;
            Ok((a, b))
        });
        eprintln!("thread finished");
    });

    Ok(())
}

#[pyfunction]
fn crash2() {
    thread::spawn(move || {
        _ = Python::with_gil(|py| -> PyResult<_> {
            let a1 = Py::new(py, Ref::new(true))?;
            let a2 = Py::new(py, Ref::new(false))?;
            a1.borrow_mut(py).inner.replace(Some(a2.clone_ref(py)));
            a2.borrow_mut(py).inner.replace(Some(a1.clone_ref(py)));

            let b = Py::new(py, Ref::new(false)))?;
            Ok((a1, b))
        });
        eprintln!("thread finished");
    });
}

#[pymodule]
#[pyo3(name = "pyo3_crash_test")]
fn module_entry(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(crash1, m)?)?;
    m.add_function(wrap_pyfunction!(crash2, m)?)?;
    Ok(())
}
```

Running the following Python code will crash inside `gc.collect()`:

```python
import sys
import gc
import time
import pyo3_crash_test

pyo3_crash_test.crash1() # or crash2
time.sleep(0.1) # not strictly required, but ensures that the thread finishes before GC
print('collecting...', file=sys.stderr)
gc.collect()
assert False, 'crash expected during GC'
```

Here `Ref` is holding an optional reference to other `Ref`, possibly itself. Setting `clear_on_traverse` will clear the reference during the traversal, simulating an otherwise totally safe operation to do.

In the `crash1` the refcounts for `a` and `b` will be 2 and 1 respectively at the time of GC. GC will overwrite `PyGC_Head._gc_prev` for both `a` and `b` (effectively converting the list to singly-linked), and then `__traverse__` will be called for `a`, creating a new `GILPool` and triggering `ReferencePool::update_counts`. As both values are registered for `Py_DECREF` the refcount of `b` will be now 0, and this will eventually call `PyObject_GC_Del` which assumes that the list is still doubly-linked and crashes.

The situation is similar for `crash2`: `a2` gets dropped with the GIL still acquired, so the refcounts for `a1`, `a2` and `b` are 2, 1 and 1 respectively. Now assume that `GILPool::new` somehow recognizes `__traverse__` and doesn't call `update_counts`. Even in this case `a1` will now remove a reference to `a2` during `__traverse__`, falsely beliving `Py_DECREF` is safe to call as the GIL is acquired, so the same thing will happen for `a2`.

</details>

## Possible Solution

It seems evident to me that `__traverse__` should be handled as if no GIL is (and can be) acquired. This implies that `Python::with_gil` should probably fail and `gil_is_acquired` should return false inside `__traverse__`.

I believe it is also possible to have multiple nested `__traverse__` calls. The GC process drains the linked list of objects while updating `PyGC_Head` and it sets `gcstate->collecting` to 1 during the process anyway, so `__traverse__` cannot be called twice for the same object and nested calls should be rare, but there are at least three possible paths (as of the main branch of CPython):

* Creating new objects may trigger `PyObject_GC_Track`, which implicitly calls `tp_traverse` in debug builds.
* `gc.get_referrers` and `gc.get_referrents` use `tp_traverse` to collect objects, and are not affected by `gcstate->collecting`. They can be independently nested if `__traverse__` somehow calls those functions.
* While directly calling `gc.collect()` checks for `gcstate->collecting` and returns immediately, the automatic GC scheduling may result in a double GC. Specifically there is an interval between the scheduling (`_Py_ScheduleGC`) and the actual GC (`_Py_RunGC`) and some opcode may end up calling `gc.collect()` inbetween. This is extremely rare though, as this interval can be at most a single opcode, and is otherwise safe.

Probably it is sufficient to add the second "traversal" counter for the nested `__traverse__` calls in addition to `GIL_COUNT`. If the traversal counter is non-zero the GIL cannot be acquired and `gil_is_acquired` should return false regardless of `GIL_COUNT`. The `GIL_COUNT` is still updated as usual, but it is effectively ignored until the traversal counter drops to zero, at which point `update_counts` should be considered (but only when `GIL_COUNT` was non-zero). But I'm not very much confident about this solution for multiple reasons, including that there is no actual guarantee that we can call any other Python API from `tp_traverse` and someday CPython may touch `PyGC_Head` from unexpected functions---I mean, even `gcstate->collecting` check is not complete so who knows? If possible, a fully static solution to prevent anything but `PyVisit` would be much more desirable.

### Suggestions

Given a full solution needs much testing at least, I looked for possible workarounds without a local PyO3 update. Unfortunately `GILPool::new` call is mandatory for the trampoline and the only indirect approach would be temporarily untracking objects via `PyObject_GC_UnTrack`, but that pretty much contradicts why we have `tp_traverse` in the first place.

So it seems that PyO3 needs a minimal update to ensure that you remain safe if you don't do weird things inside `__traverse__`. No promises here, but I hope to file a small PR to disable `update_counts` from the trampoline soon. In my local testing this fixes `crash1` but not `crash2`.

## Metadata

### Your operating system and version

x86-64 Linux (at least)

### Your Python version (`python --version`)

3.8.10, 3.9.16, 3.10.11, 3.11.13

### Your Rust version (`rustc --version`)

1.69.0

### Your PyO3 version

0.18.3

### How did you install python? Did you use a virtualenv?

[`deadsnakes` PPA](https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa), the bundled version of `venv`, `pip` and the most recent version of `maturin` (0.15.2).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`traverse` should not alter any refcounts, at least in CPython #3165

Bug Description

Reproduction

Possible Solution

Suggestions

Metadata

Your operating system and version

Your Python version (`python --version`)

Your Rust version (`rustc --version`)

Your PyO3 version

How did you install python? Did you use a virtualenv?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

__traverse__ should not alter any refcounts, at least in CPython #3165

Description

Bug Description

Reproduction

Possible Solution

Suggestions

Metadata

Your operating system and version

Your Python version (python --version)

Your Rust version (rustc --version)

Your PyO3 version

How did you install python? Did you use a virtualenv?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`traverse` should not alter any refcounts, at least in CPython #3165

Your Python version (`python --version`)

Your Rust version (`rustc --version`)