Skip to content

Don't hold locks in RecvWork::wait and SendWork::wait in ProcessGroupGloo #30164

@rohan-varma

Description

@rohan-varma

🚀 Feature

Per @pietern's comment in #29928, we can avoid holding the lock in ProcessGroupGloo when waiting on send/recv operations:

"The lock is there only to synchronize a background thread that executes the work and marks it as completed, not multiple waiters. For the send/recv work, that background thread is the actual Gloo I/O thread that marks a send/recv as completed, so for this case we could get rid of the locks entirely."

See comments in the PR for additional context.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528

Metadata

Metadata

Assignees

Labels

better-engineeringRelatively self-contained tasks for better engineering contributorsmodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions