Skip to content

Conversation

@WoosungMyung
Copy link
Contributor

@WoosungMyung WoosungMyung commented Jul 12, 2025

Thanks again for giving opportunity for improving this Community!
This PR is from Issue #7423.

  1. Motivation

To improve compatibility with low-level profiling tools (e.g., NVIDIA CUPTI or DCGM), it can be useful to expose parallelism-specific rank (tensor/pipeline/data) at the engine level.

  1. Changes

I Added three getter methods to DeepSpeedEngine:

  • get_tensor_parallel_rank()
  • get_pipeline_parallel_rank()
  • get_data_parallel_rank()

Thank you for reviewing this contribution!

@WoosungMyung WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 5df4bcc to 718fb78 Compare July 12, 2025 23:53
@WoosungMyung WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 718fb78 to cc89694 Compare July 13, 2025 01:16
@WoosungMyung WoosungMyung force-pushed the feature/add-parallel-rank-api branch from 2ba0c78 to 726c30e Compare July 13, 2025 02:30
@WoosungMyung WoosungMyung force-pushed the feature/add-parallel-rank-api branch from c4b8193 to 6c8e092 Compare August 1, 2025 09:26
@WoosungMyung WoosungMyung requested a review from loadams as a code owner August 1, 2025 09:26
@WoosungMyung
Copy link
Contributor Author

WoosungMyung commented Aug 1, 2025

@sfc-gh-truwase
Hi, Given that my PR only introduces a getter method in Accelerator and doesn't affect MoE or Zero-related logic, I believe these failures are unrelated to the changes : xpu-max1100. Thanks

@sfc-gh-truwase
Copy link
Collaborator

@delock can you please help with the xpu CI? Thanks!

@loadams
Copy link
Collaborator

loadams commented Aug 1, 2025

@delock can you please help with the xpu CI? Thanks!

I've also reached out to @Liangliang-Ma on this as well. The test is currently skipped on this PR so it should merge now.

@loadams loadams enabled auto-merge (squash) August 1, 2025 22:04
@loadams loadams merged commit 0e51e09 into deepspeedai:master Aug 1, 2025
9 checks passed
@WoosungMyung
Copy link
Contributor Author

Thanks a lot for merging the previous PR! I really appreciate the review and guidance throughout the process!

LYMDLUT pushed a commit to LYMDLUT/DeepSpeed that referenced this pull request Aug 20, 2025
Thanks again for giving opportunity for improving this Community!
This PR is from Issue deepspeedai#7423.

1) Motivation

To improve compatibility with low-level profiling tools (e.g., NVIDIA
CUPTI or DCGM), it can be useful to expose parallelism-specific rank
(tensor/pipeline/data) at the engine level.

2) Changes

I Added three getter methods to DeepSpeedEngine:
  - get_tensor_parallel_rank()
  - get_pipeline_parallel_rank()
  - get_data_parallel_rank()

Thank you for reviewing this contribution!

---------

Signed-off-by: WoosungMyung <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: lym <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
Thanks again for giving opportunity for improving this Community!
This PR is from Issue deepspeedai#7423.

1) Motivation

To improve compatibility with low-level profiling tools (e.g., NVIDIA
CUPTI or DCGM), it can be useful to expose parallelism-specific rank
(tensor/pipeline/data) at the engine level.

2) Changes

I Added three getter methods to DeepSpeedEngine:
  - get_tensor_parallel_rank()
  - get_pipeline_parallel_rank()
  - get_data_parallel_rank()


Thank you for reviewing this contribution!

---------

Signed-off-by: WoosungMyung <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants