-
Notifications
You must be signed in to change notification settings - Fork 4.7k
SuperOffload Release #7559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SuperOffload Release #7559
Conversation
|
@xylian86 please update README and folder with an appropriate |
|
@sfc-gh-truwase Sure, I have updated it. |
Sorry, I just realized that I shared this feedback on the wrong PR :(. This feedback is for in the DSE PR. |
Yep, I’d already caught that and added the |
|
@sfc-gh-truwase It seems that merging is blocked. |
|
@xylian86 @sfc-gh-truwase Received error with this pr: I not found |
|
@nguyen599 I double-checked this PR and confirmed that it works without enabling superoffload. Could you try reinstalling DeepSpeed using: This should ensure that all the latest modules are available. |
|
ah, thank you, it works well now. @xylian86 |
|
@xylian86 I know what happened, you forgot create |
This PR just fixes tiny error for pr [7559](#7559) in the comment reported error [here](#7559 (comment)). ``` [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1462, in _configure_optimizer [rank1]: self.optimizer = self._configure_zero_optimizer(basic_optimizer) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1835, in _configure_zero_optimizer [rank1]: from deepspeed.runtime.superoffload.superoffload_stage3 import SuperOffloadOptimizer_Stage3 [rank1]: ModuleNotFoundError: No module named 'deepspeed.runtime.superoffload' ``` Create `__init__.py` for superoffload folder to avoid import error when superoffload folder irgnored by pip installation. --------- Signed-off-by: nguyen599 <[email protected]>
This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](#7559) - SuperOffload implementation. [PR#990](deepspeedai/DeepSpeedExamples#990) - Examples. --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Original PR #7559 by xylian86 Original: deepspeedai/DeepSpeed#7559
Merged from original PR #7559 Original: deepspeedai/DeepSpeed#7559
This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](#7559) - SuperOffload implementation. [PR#990](deepspeedai/DeepSpeedExamples#990) - Examples. --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
This PR introduces **SuperOffload**—an optimizer designed for Superchips (Nvidia GH200 & GB200, AMD MI300A) with high CPU–GPU bandwidth. It enables **full fine-tuning** of **GPT-OSS-20B, Qwen3-14B, and Phi-4** on a single GH200 GPU, achieving up to **~500 TFLOPS**, using Hugging Face Transformers and DeepSpeed—no custom modeling code required. SuperOffload extends ZeRO-Offload with fine-grained control and CPUAdam rollback utilities, allowing GPU execution to overlap with CPUAdam. This reduces GPU idle time and improves overall efficiency. Key changes: - New SuperOffloadOptimizer_Stage3 optimizer. - C++/CUDA binding for adam_rollback to revert one optimization step. - Config additions including super_offload and cpuadam_cores_perc. A detailed blog and tutorial will be available soon. --------- Co-authored-by: Olatunji Ruwase <[email protected]>
This PR just fixes tiny error for pr [7559](deepspeedai#7559) in the comment reported error [here](deepspeedai#7559 (comment)). ``` [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1462, in _configure_optimizer [rank1]: self.optimizer = self._configure_zero_optimizer(basic_optimizer) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1835, in _configure_zero_optimizer [rank1]: from deepspeed.runtime.superoffload.superoffload_stage3 import SuperOffloadOptimizer_Stage3 [rank1]: ModuleNotFoundError: No module named 'deepspeed.runtime.superoffload' ``` Create `__init__.py` for superoffload folder to avoid import error when superoffload folder irgnored by pip installation. --------- Signed-off-by: nguyen599 <[email protected]>
This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](deepspeedai#7559) - SuperOffload implementation. [PR#990](deepspeedai/DeepSpeedExamples#990) - Examples. --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
This PR introduces SuperOffload—an optimizer designed for Superchips (Nvidia GH200 & GB200, AMD MI300A) with high CPU–GPU bandwidth. It enables full fine-tuning of GPT-OSS-20B, Qwen3-14B, and Phi-4 on a single GH200 GPU, achieving up to ~500 TFLOPS, using Hugging Face Transformers and DeepSpeed—no custom modeling code required.
SuperOffload extends ZeRO-Offload with fine-grained control and CPUAdam rollback utilities, allowing GPU execution to overlap with CPUAdam. This reduces GPU idle time and improves overall efficiency.
Key changes:
A detailed blog and tutorial will be available soon.