-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Add RISCV64 CPU Support #7387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RISCV64 CPU Support #7387
Conversation
|
@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise? |
In the long run, a unique accelerator is worthwhile because RISC-V CPUs may have different acceleration/computation backends, which could differ from Intel CPUs. By the way, does the DeepSpeed community have plans to support different architecture CPUs? |
@heyujiao99 - We are certainly happy to support them if users are able to contribute/validate them. We don't have any to test with ourselves. Also @tjruwase this is similar to our discussion before on switching cpu_accelerator to xeon_accelerator and making the existing cpu_accelerator more generic? |
|
Hi @heyujiao99 , do you know whether in PyTorch, RISCV64 shares the same 'cpu' device type with Intel device? Do you see RISCV64 difference only in OpBuilder, or do you forsee difference in other pytorch code as well? If most code is the same and only OpBuilder is different, then an abstraction in OpBuilder and create sub-directory under |
|
@heyujiao99 @loadams @tjruwase following up this discussion. From current PR, I see that main difference between x86-64 CPU and riscv CPU are how C++ kernel looks like and how to build them. For building the kernel, OpBuilder base class should have a way to choose the right build flag. For kernel itself, there are two situations:
With this mechanis we should be able to catch most difference between x86_64 and riscv, and also extendable to other CPU ISA if needed. |
|
@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed. @heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, |
Thanks for all the discussions. I totally agree with these, and I should find a more extendable way to support riscv cpu. But I found that some dependencies like PyTorch and cpuinfo also lack mature support for RISC-V CPUs, and I'll explore it further and attempt later when I have a deeper understanding. |
Thanks @heyujiao99 ! Looking forward for your PR. If you need discussion ping me on this thread or via email, I'll be happy to see new architecture running DeepSpeed. |
This patch adds riscv64 support for the deepspeed_shm_comm operator,enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull #7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]>
This patch adds riscv64 support for the deepspeed_shm_comm operator,enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]> Signed-off-by: Flakes342 <[email protected]>
This patch adds riscv64 support for the deepspeed_shm_comm operator,enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]>
Hi,
This patch adds riscv64 support for the deepspeed_shm_comm operator,enabling DeepSpeed to perform pure CPU training/inference on RISCV64 hosts, for research purposes.
Unit tests related have passed:
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%]