Skip to content

[C10D] Add better profiling title for NCCL barrier #140257

@x41lakazam

Description

@x41lakazam

🚀 The feature, motivation and pitch

I'm working on analyzing distributed applications based on torch c10d, and the profiling title of the NCCL barrier is confusing.

Currently the NCCL barrier implementation calls an allreduce operation but the profiling title is left to be 'nccl:all_reduce', it would help debugging distributed applications if there would be an hint that this call is a barrier call.

I suggest changing the title to something like 'nccl:all_reduce_barrier'

Alternatives

No response

Additional context

No response

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: ncclProblems related to nccl supportoncall: distributedAdd this issue/PR to distributed oncall triage queue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions