Add RISCV64 CPU Support #7387

heyujiao99 · 2025-06-25T07:24:34Z

Hi,
This patch adds riscv64 support for the deepspeed_shm_comm operator，enabling DeepSpeed to perform pure CPU training/inference on RISCV64 hosts, for research purposes.

Unit tests related have passed：

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%]

loadams · 2025-06-25T21:07:43Z

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

heyujiao99 · 2025-06-26T01:00:17Z

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

In the long run, a unique accelerator is worthwhile because RISC-V CPUs may have different acceleration/computation backends, which could differ from Intel CPUs. By the way, does the DeepSpeed community have plans to support different architecture CPUs?

loadams · 2025-06-27T16:08:54Z

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

In the long run, a unique accelerator is worthwhile because RISC-V CPUs may have different acceleration/computation backends, which could differ from Intel CPUs. By the way, does the DeepSpeed community have plans to support different architecture CPUs?

@heyujiao99 - We are certainly happy to support them if users are able to contribute/validate them. We don't have any to test with ourselves. Also @tjruwase this is similar to our discussion before on switching cpu_accelerator to xeon_accelerator and making the existing cpu_accelerator more generic?

delock · 2025-07-03T03:07:29Z

Hi @heyujiao99 , do you know whether in PyTorch, RISCV64 shares the same 'cpu' device type with Intel device? Do you see RISCV64 difference only in OpBuilder, or do you forsee difference in other pytorch code as well? If most code is the same and only OpBuilder is different, then an abstraction in OpBuilder and create sub-directory under csrc/cpu/ such as csrc/cpu/comm/x86_64, csrc/cpu/comm/riscv64 might be better choice.

delock · 2025-07-09T02:13:04Z

@heyujiao99 @loadams @tjruwase following up this discussion. From current PR, I see that main difference between x86-64 CPU and riscv CPU are how C++ kernel looks like and how to build them. For building the kernel, OpBuilder base class should have a way to choose the right build flag. For kernel itself, there are two situations:

Kernel code is different between x86_64 and riscv, in that case, we need a naming convention to help OpBuilder select the right source code to build. The idea is to give a 'base' name and let OpBuilder add the right prefix or suffix to pick the right kernel code.
Kernel code is same between x86_64 and riscv, in that case, the same source code can be used.
That said, we could introduce a mechanism that pick kernel source code with following logic: a. pick 'machine' specific kernel code if possible. b. pick generic kernel code if no machine specific code is found. c. raise error if either not found.

With this mechanis we should be able to catch most difference between x86_64 and riscv, and also extendable to other CPU ISA if needed.

tjruwase · 2025-07-09T15:08:03Z

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.

@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

heyujiao99 · 2025-07-10T01:05:32Z

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.

@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

Thanks for all the discussions. I totally agree with these, and I should find a more extendable way to support riscv cpu. But I found that some dependencies like PyTorch and cpuinfo also lack mature support for RISC-V CPUs, and I'll explore it further and attempt later when I have a deeper understanding.

delock · 2025-07-10T03:07:36Z

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.
@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

Thanks for all the discussions. I totally agree with these, and I should find a more extendable way to support riscv cpu. But I found that some dependencies like PyTorch and cpuinfo also lack mature support for RISC-V CPUs, and I'll explore it further and attempt later when I have a deeper understanding.

Thanks @heyujiao99 ! Looking forward for your PR. If you need discussion ping me on this thread or via email, I'll be happy to see new architecture running DeepSpeed.

This patch adds riscv64 support for the deepspeed_shm_comm operator，enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull #7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]>

This patch adds riscv64 support for the deepspeed_shm_comm operator，enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]> Signed-off-by: Flakes342 <[email protected]>

This patch adds riscv64 support for the deepspeed_shm_comm operator，enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Ma, Guokai <[email protected]>

add riscv64 cpu support

7474d53

heyujiao99 requested review from jomayeri, loadams and tjruwase as code owners June 25, 2025 07:24

Merge branch 'master' into riscv-cpu

0b0fb95

loadams added 2 commits June 27, 2025 09:08

Merge branch 'master' into riscv-cpu

6f26dca

Merge branch 'master' into riscv-cpu

77d5847

Merge branch 'master' into riscv-cpu

f5ac1a9

heyujiao99 closed this Jul 10, 2025

heyujiao99 mentioned this pull request Aug 26, 2025

Add riscv64 cpu support in deepspeed_shm_comm op #7519

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RISCV64 CPU Support #7387

Add RISCV64 CPU Support #7387

Uh oh!

heyujiao99 commented Jun 25, 2025

Uh oh!

loadams commented Jun 25, 2025

Uh oh!

heyujiao99 commented Jun 26, 2025 •

edited

Loading

Uh oh!

loadams commented Jun 27, 2025

Uh oh!

delock commented Jul 3, 2025

Uh oh!

delock commented Jul 9, 2025

Uh oh!

tjruwase commented Jul 9, 2025

Uh oh!

heyujiao99 commented Jul 10, 2025

Uh oh!

delock commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add RISCV64 CPU Support #7387

Add RISCV64 CPU Support #7387

Uh oh!

Conversation

heyujiao99 commented Jun 25, 2025

Uh oh!

loadams commented Jun 25, 2025

Uh oh!

heyujiao99 commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loadams commented Jun 27, 2025

Uh oh!

delock commented Jul 3, 2025

Uh oh!

delock commented Jul 9, 2025

Uh oh!

tjruwase commented Jul 9, 2025

Uh oh!

heyujiao99 commented Jul 10, 2025

Uh oh!

delock commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

heyujiao99 commented Jun 26, 2025 •

edited

Loading