-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
better-engineeringRelatively self-contained tasks for better engineering contributorsRelatively self-contained tasks for better engineering contributorshigh priorityoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
DDP communication hook implementation #39272 provides an API called get_future to retrieve a future associated with the completion of c10d.ProcessGroupNCCL.work. get_future API is currently only supported by NCCL #41596 , because most of the intensive models, where our gradient compression effort might be more useful, use NCCL.
To support Gloo and MPI, we can modify some of the core ProcessGroupGloo and ProcessGroupMPI code to ensure we can do something similar there.
cc @ezyang @gchanan @zou3519 @bdhirsh @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @xush6528
Metadata
Metadata
Assignees
Labels
better-engineeringRelatively self-contained tasks for better engineering contributorsRelatively self-contained tasks for better engineering contributorshigh priorityoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module