Skip to content

Don't run user code in rpc until all involved RRefs are confirmed by owner #27098

@mrshenli

Description

@mrshenli

In #25499, the user code in rpc/remote will be execute right away when the callee receives the call. But as agreed in the protocol #26759, the user code should not run until all RRefs are confirmed by the owner, to make sure that reference counting is correctly handled even if user code triggers failures.

cc @ezyang @gchanan @zou3519 @jerryzh168 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528

Metadata

Metadata

Assignees

Labels

better-engineeringRelatively self-contained tasks for better engineering contributorshigh prioritymodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions