-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: ncclProblems related to nccl supportProblems related to nccl supportoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue
Description
🚀 The feature, motivation and pitch
I'm working on analyzing distributed applications based on torch c10d, and the profiling title of the NCCL barrier is confusing.
Currently the NCCL barrier implementation calls an allreduce operation but the profiling title is left to be 'nccl:all_reduce', it would help debugging distributed applications if there would be an hint that this call is a barrier call.
I suggest changing the title to something like 'nccl:all_reduce_barrier'
Alternatives
No response
Additional context
No response
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o
mori360
Metadata
Metadata
Assignees
Labels
module: ncclProblems related to nccl supportProblems related to nccl supportoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue