First, thank you for providing such a powerful and scalable framework for distributed training.
I would like to propose a small but practical feature that would help with system-level profiling using external tracing tools like CUPTI.
While torchrun already sets RANK, LOCAL_RANK, and WORLD_SIZE as environment variables, DeepSpeed
internally computes parallelism ranks: TP / DP/ PP Rank using APIs like topo.get_tensor_model_parallel_rank( )
However, these ranks are not exposed as environment variables, which makes it difficult to access them from low level tools such as CUPTI.
Would you consider adding the following environment variables after initialize parallelism?
like
os.environ["TP_RANK"] = str(self._topo.get_tensor_model_parallel_rank(rank))
os.environ["PP_RANK"] = str(self._topo.get_pipeline_model_parallel_rank(rank))
os.environ["DP_RANK"] = str(self._topo.get_data_parallel_rank(rank))
Would you consider accepting a PR for this?
Thank you again for your work and consideration!
Looking forward to your thoughts.
Best Regards.
[Woosung Myung]