-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
better-engineeringRelatively self-contained tasks for better engineering contributorsRelatively self-contained tasks for better engineering contributorshigh prioritymodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
The above mentioned methods don't handle exceptions correctly in the lambda functions run in the threadpool. The exceptions are swallowed by the threadpool and the RPC ends up blocking forever since the future is never satisfied. For enqueueRecv we should have a try-catch block around it and return MessageType::Exception. For enqueuSend, we can pass the future into the lambda, have a try-catch block, catch an exception and mark the future completed with MessageType::Exception.
cc @ezyang @gchanan @zou3519 @jerryzh168 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528
mrshenli and rohan-varma
Metadata
Metadata
Assignees
Labels
better-engineeringRelatively self-contained tasks for better engineering contributorsRelatively self-contained tasks for better engineering contributorshigh prioritymodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module