Skip to content

Possible deadlock when using RPyC over paramiko #1634

@Jongy

Description

@Jongy

I'm using paramiko in a testing framework to facilitate control of remote machines. Some of the remote actions are done in Python, using RPyC. For this purpose, we open a direct-tcpip channel using the same SSH transport, and let RPyC use it.

It works really great, but a certain scenario can cause deadlocks: paramiko locks the transport before it begins sending anything on the underlying socket (Transport.clear_to_send_lock lock in Transport._send_user_message()). With the lock taken, the send operation proceeds in Transport.send_message(). Now, in case a GC is triggered (and there are possible trigger spots from send_message()), if an RPyC object is finalized then its destructor will notify the RPyC server about it, sending data, thus calling back into paramiko, which will deadlock when it tries to take clear_to_send_lock.

The deadlock scenario I encountered is down a Transport.open_session() (that's unrelated to RPyC, it's a new channel for command execution), inside Packetizer.send_message() the call to b() raises an exception and the constructor BaseException_new leads to a GC invocation.

RPyC protects itself from such a scenario: Originally by disabling GC during send operations (tomerfiliba-org/rpyc@8c1eff8) and nowadays by queuing data for later send if the lock is already taken ( tomerfiliba-org/rpyc@9ae608e).

Anyway, I solved it locally for the project using the 1st method (disabling GC altogether). Still,
it was pretty frustrating to debug (CI machines hanging while the only life sign I have is the output from pytest... 😢 ) and so for the sake of other developers, I want it fixed (perhaps like the 2nd solution of RPyC), or, at the very least, automatically detected (if owner_thread == self: raise RuntimeError("deadlock in send_user_message()!")) so one doesn't waste time on hangs.

I was surprised Python itself doesn't protect you from such simple deadlocks (if threading.RLock is re-entrant, I'd expect threading.Lock to tell you when you are erroneously re-entering). But sadly it doesn't. Definitely gonna ask about it in the Python mailing lists...

I'd be happy to open a PR with a possible fix / detection, please tell me what you think.

cc: @ty93

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions