Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine #7427

WoosungMyung · 2025-07-12T23:27:14Z

Thanks again for giving opportunity for improving this Community!
This PR is from Issue #7423.

Motivation

To improve compatibility with low-level profiling tools (e.g., NVIDIA CUPTI or DCGM), it can be useful to expose parallelism-specific rank (tensor/pipeline/data) at the engine level.

Changes

I Added three getter methods to DeepSpeedEngine:

get_tensor_parallel_rank()
get_pipeline_parallel_rank()
get_data_parallel_rank()

Thank you for reviewing this contribution!

Signed-off-by: WoosungMyung <[email protected]>

deepspeed/runtime/engine.py

Signed-off-by: WoosungMyung <[email protected]>

WoosungMyung · 2025-08-01T14:05:37Z

@sfc-gh-truwase
Hi, Given that my PR only introduces a getter method in Accelerator and doesn't affect MoE or Zero-related logic, I believe these failures are unrelated to the changes : xpu-max1100. Thanks

sfc-gh-truwase · 2025-08-01T20:09:05Z

@delock can you please help with the xpu CI? Thanks!

loadams · 2025-08-01T22:04:46Z

@delock can you please help with the xpu CI? Thanks!

I've also reached out to @Liangliang-Ma on this as well. The test is currently skipped on this PR so it should merge now.

WoosungMyung · 2025-08-01T22:38:07Z

Thanks a lot for merging the previous PR! I really appreciate the review and guidance throughout the process!

Thanks again for giving opportunity for improving this Community! This PR is from Issue deepspeedai#7423. 1) Motivation To improve compatibility with low-level profiling tools (e.g., NVIDIA CUPTI or DCGM), it can be useful to expose parallelism-specific rank (tensor/pipeline/data) at the engine level. 2) Changes I Added three getter methods to DeepSpeedEngine: - get_tensor_parallel_rank() - get_pipeline_parallel_rank() - get_data_parallel_rank() Thank you for reviewing this contribution! --------- Signed-off-by: WoosungMyung <[email protected]> Co-authored-by: Logan Adams <[email protected]> Signed-off-by: lym <[email protected]>

Thanks again for giving opportunity for improving this Community! This PR is from Issue deepspeedai#7423. 1) Motivation To improve compatibility with low-level profiling tools (e.g., NVIDIA CUPTI or DCGM), it can be useful to expose parallelism-specific rank (tensor/pipeline/data) at the engine level. 2) Changes I Added three getter methods to DeepSpeedEngine: - get_tensor_parallel_rank() - get_pipeline_parallel_rank() - get_data_parallel_rank() Thank you for reviewing this contribution! --------- Signed-off-by: WoosungMyung <[email protected]> Co-authored-by: Logan Adams <[email protected]>

WoosungMyung requested review from tjruwase and tohtana as code owners July 12, 2025 23:27

WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 5df4bcc to 718fb78 Compare July 12, 2025 23:53

Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine

cc89694

Signed-off-by: WoosungMyung <[email protected]>

WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 718fb78 to cc89694 Compare July 13, 2025 01:16

sfc-gh-truwase reviewed Jul 13, 2025

View reviewed changes

deepspeed/runtime/engine.py Outdated Show resolved Hide resolved

WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 2ba0c78 to 726c30e Compare July 13, 2025 02:30

WoosungMyung requested a review from sfc-gh-truwase July 13, 2025 23:42

Connect DeepSpeedEngine to existing parallel rank utils

6c8e092

Signed-off-by: WoosungMyung <[email protected]>

WoosungMyung force-pushed the feature/add-parallel-rank-api branch from c4b8193 to 6c8e092 Compare August 1, 2025 09:26

WoosungMyung requested a review from loadams as a code owner August 1, 2025 09:26

Merge branch 'master' into feature/add-parallel-rank-api

6c7a5e5

sfc-gh-truwase approved these changes Aug 1, 2025

View reviewed changes

Merge branch 'master' into feature/add-parallel-rank-api

19a9e0a

loadams enabled auto-merge (squash) August 1, 2025 22:04

loadams approved these changes Aug 1, 2025

View reviewed changes

loadams merged commit 0e51e09 into deepspeedai:master Aug 1, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine #7427

Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine #7427

Uh oh!

WoosungMyung commented Jul 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

WoosungMyung commented Aug 1, 2025 •

edited

Loading

Uh oh!

sfc-gh-truwase commented Aug 1, 2025

Uh oh!

loadams commented Aug 1, 2025

Uh oh!

Uh oh!

WoosungMyung commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine #7427

Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine #7427

Uh oh!

Conversation

WoosungMyung commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

WoosungMyung commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-truwase commented Aug 1, 2025

Uh oh!

loadams commented Aug 1, 2025

Uh oh!

Uh oh!

WoosungMyung commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WoosungMyung commented Jul 12, 2025 •

edited

Loading

WoosungMyung commented Aug 1, 2025 •

edited

Loading