-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🚀 The feature, motivation and pitch
Motivation:
-
Gives better fault tolerance against hangs in comm init, comm destroy and P2P operations (involves dynamic connection performed by CPU).
-
Allow overlap between NCCL init and other stuff users want the main thread to do (e.g., model or data loader init)
Here "non-blocking" refers to whether a NCCL API call would immediately return or block the host CPU (Traditionally, some NCCL APIs such as ncclCommInitRank may block for a little when rendezvous is performed.)
If the user wants the main thread to do other stuff while NCCL is initializing, this mode would also help as it puts NCCL init to the background.
Alternatives
No response
Additional context
This knob is control by PyTorch here:
pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp
Lines 90 to 97 in 2695698
| bool nccl_use_nonblocking() { | |
| static bool nccl_use_nonblocking_ = | |
| c10::utils::check_env("TORCH_NCCL_USE_COMM_NONBLOCKING") == true; | |
| if (nccl_use_nonblocking_) { | |
| TORCH_WARN_ONCE("Using experimental non-blocking NCCL communicator."); | |
| } | |
| return nccl_use_nonblocking_; | |
| } |
cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o