[REQUEST] Expose TP/DP/PP Rank as Environment Variable for integration with low Level Profiling Tools

First, thank you for providing such a powerful and scalable framework for distributed training.

I would like to propose a small but practical feature that would help with system-level profiling using external tracing tools like CUPTI.

While torchrun already sets RANK, LOCAL_RANK, and WORLD_SIZE as environment variables, DeepSpeed
internally computes parallelism ranks: TP / DP/ PP Rank using APIs like topo.get_tensor_model_parallel_rank( )

However, these ranks are not exposed as environment variables, which makes it difficult to access them from low level tools such as CUPTI.

Would you consider adding  the following environment variables after initialize parallelism?

like

        os.environ["TP_RANK"] = str(self._topo.get_tensor_model_parallel_rank(rank))
        os.environ["PP_RANK"] = str(self._topo.get_pipeline_model_parallel_rank(rank))
        os.environ["DP_RANK"] = str(self._topo.get_data_parallel_rank(rank))

Would you consider accepting a PR for this?

Thank you again for your work and consideration!
Looking forward to your thoughts.

Best Regards.
[Woosung Myung]






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST] Expose TP/DP/PP Rank as Environment Variable for integration with low Level Profiling Tools #7423

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQUEST] Expose TP/DP/PP Rank as Environment Variable for integration with low Level Profiling Tools #7423

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions