Skip to content

[REQUEST] Expose TP/DP/PP Rank as Environment Variable for integration with low Level Profiling Tools #7423

@WoosungMyung

Description

@WoosungMyung

First, thank you for providing such a powerful and scalable framework for distributed training.

I would like to propose a small but practical feature that would help with system-level profiling using external tracing tools like CUPTI.

While torchrun already sets RANK, LOCAL_RANK, and WORLD_SIZE as environment variables, DeepSpeed
internally computes parallelism ranks: TP / DP/ PP Rank using APIs like topo.get_tensor_model_parallel_rank( )

However, these ranks are not exposed as environment variables, which makes it difficult to access them from low level tools such as CUPTI.

Would you consider adding the following environment variables after initialize parallelism?

like

    os.environ["TP_RANK"] = str(self._topo.get_tensor_model_parallel_rank(rank))
    os.environ["PP_RANK"] = str(self._topo.get_pipeline_model_parallel_rank(rank))
    os.environ["DP_RANK"] = str(self._topo.get_data_parallel_rank(rank))

Would you consider accepting a PR for this?

Thank you again for your work and consideration!
Looking forward to your thoughts.

Best Regards.
[Woosung Myung]

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions