add --bind_cores_to_rank to zero offload tutorial #7474

delock · 2025-08-08T06:58:15Z

In ZeRO offload, significant time is spent on CPUAdam, which is CPU code. Thus use --bind_cores_to_rank in deepspeed launch command would help improve the performance of ZeRO offload. This PR add this command to ZeRO offload tutorial to increase user awareness.

For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with 128 CPU cores, the average step time is as follow, near 1.3x performance improvement:
without --bind_cores_to_rank: 3084.44ms per step
with --bind_cores_to_rank: 2383.16ms per step

docs/_tutorials/zero-offload.md

In ZeRO offload, significant time is spent on CPUAdam, which is CPU code. Thus use `--bind_cores_to_rank` in deepspeed launch command would help improve the performance of ZeRO offload. This PR add this command to ZeRO offload tutorial to increase user awareness. For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with 128 CPU cores, the average step time is as follow, near 1.3x performance improvement: without `--bind_cores_to_rank`: 3084.44ms per step with `--bind_cores_to_rank`: 2383.16ms per step --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: lym <[email protected]>

In ZeRO offload, significant time is spent on CPUAdam, which is CPU code. Thus use `--bind_cores_to_rank` in deepspeed launch command would help improve the performance of ZeRO offload. This PR add this command to ZeRO offload tutorial to increase user awareness. For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with 128 CPU cores, the average step time is as follow, near 1.3x performance improvement: without `--bind_cores_to_rank`: 3084.44ms per step with `--bind_cores_to_rank`: 2383.16ms per step --------- Co-authored-by: Olatunji Ruwase <[email protected]>

add --bind_cores_to_rank to zero offload tutorial

74213e2

delock requested review from loadams and tjruwase as code owners August 8, 2025 06:58

tjruwase reviewed Aug 8, 2025

View reviewed changes

docs/_tutorials/zero-offload.md Outdated Show resolved Hide resolved

Update docs/_tutorials/zero-offload.md

3ee0bca

tjruwase approved these changes Aug 8, 2025

View reviewed changes

hwchen2017 merged commit f03d416 into master Aug 8, 2025
2 checks passed

hwchen2017 deleted the gma/zero_offload_doc branch August 8, 2025 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add --bind_cores_to_rank to zero offload tutorial #7474

add --bind_cores_to_rank to zero offload tutorial #7474

Uh oh!

delock commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add --bind_cores_to_rank to zero offload tutorial #7474

add --bind_cores_to_rank to zero offload tutorial #7474

Uh oh!

Conversation

delock commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants