Skip to content

Conversation

@heyujiao99
Copy link
Contributor

Hi,
This patch adds riscv64 support for the deepspeed_shm_comm operator,enabling DeepSpeed to perform pure CPU training/inference on RISCV64 hosts, for research purposes.

Unit tests related have passed:

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%]

@loadams
Copy link
Collaborator

loadams commented Jun 25, 2025

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

@heyujiao99
Copy link
Contributor Author

heyujiao99 commented Jun 26, 2025

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

In the long run, a unique accelerator is worthwhile because RISC-V CPUs may have different acceleration/computation backends, which could differ from Intel CPUs. By the way, does the DeepSpeed community have plans to support different architecture CPUs?

@loadams
Copy link
Collaborator

loadams commented Jun 27, 2025

@heyujiao99 - would it be worth adding this as a unique accelerator? Or do you foresee this as the only difference compilation wise?

In the long run, a unique accelerator is worthwhile because RISC-V CPUs may have different acceleration/computation backends, which could differ from Intel CPUs. By the way, does the DeepSpeed community have plans to support different architecture CPUs?

@heyujiao99 - We are certainly happy to support them if users are able to contribute/validate them. We don't have any to test with ourselves. Also @tjruwase this is similar to our discussion before on switching cpu_accelerator to xeon_accelerator and making the existing cpu_accelerator more generic?

@delock
Copy link
Collaborator

delock commented Jul 3, 2025

Hi @heyujiao99 , do you know whether in PyTorch, RISCV64 shares the same 'cpu' device type with Intel device? Do you see RISCV64 difference only in OpBuilder, or do you forsee difference in other pytorch code as well? If most code is the same and only OpBuilder is different, then an abstraction in OpBuilder and create sub-directory under csrc/cpu/ such as csrc/cpu/comm/x86_64, csrc/cpu/comm/riscv64 might be better choice.

@delock
Copy link
Collaborator

delock commented Jul 9, 2025

@heyujiao99 @loadams @tjruwase following up this discussion. From current PR, I see that main difference between x86-64 CPU and riscv CPU are how C++ kernel looks like and how to build them. For building the kernel, OpBuilder base class should have a way to choose the right build flag. For kernel itself, there are two situations:

  1. Kernel code is different between x86_64 and riscv, in that case, we need a naming convention to help OpBuilder select the right source code to build. The idea is to give a 'base' name and let OpBuilder add the right prefix or suffix to pick the right kernel code.
  2. Kernel code is same between x86_64 and riscv, in that case, the same source code can be used.
  3. That said, we could introduce a mechanism that pick kernel source code with following logic: a. pick 'machine' specific kernel code if possible. b. pick generic kernel code if no machine specific code is found. c. raise error if either not found.

With this mechanis we should be able to catch most difference between x86_64 and riscv, and also extendable to other CPU ISA if needed.

@tjruwase
Copy link
Contributor

tjruwase commented Jul 9, 2025

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.

@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

@heyujiao99
Copy link
Contributor Author

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.

@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

Thanks for all the discussions. I totally agree with these, and I should find a more extendable way to support riscv cpu. But I found that some dependencies like PyTorch and cpuinfo also lack mature support for RISC-V CPUs, and I'll explore it further and attempt later when I have a deeper understanding.

@heyujiao99 heyujiao99 closed this Jul 10, 2025
@delock
Copy link
Collaborator

delock commented Jul 10, 2025

@delock thanks for the analysis. I am aligned with your proposal that we should reuse as much as possible, even if some refactoring is needed.
@heyujiao99, it would be great to get your thoughts on how to minimize code duplication in this PR. In particular, shm-riscv64.cpp appears to duplicate shm.cpp.

Thanks for all the discussions. I totally agree with these, and I should find a more extendable way to support riscv cpu. But I found that some dependencies like PyTorch and cpuinfo also lack mature support for RISC-V CPUs, and I'll explore it further and attempt later when I have a deeper understanding.

Thanks @heyujiao99 ! Looking forward for your PR. If you need discussion ping me on this thread or via email, I'll be happy to see new architecture running DeepSpeed.

delock added a commit that referenced this pull request Aug 29, 2025
This patch adds riscv64 support for the deepspeed_shm_comm
operator,enabling DeepSpeed to perform CPU training/inference on RISCV64
hosts, for research purposes. Based on the discussion in pull #7387 ,
this patch refactors some original code to support multiple CPU
architectures.

Related tests have passed on x86 and RISC-V CPU, and I successfully ran
Qwen2.5 on a RISC-V CPU,
```bash
(myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini
plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

(myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini
plugins: forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

```

---------

Signed-off-by: heyujiao99 <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Ma, Guokai <[email protected]>
Flakes342 pushed a commit to Flakes342/DeepSpeed that referenced this pull request Sep 9, 2025
This patch adds riscv64 support for the deepspeed_shm_comm
operator,enabling DeepSpeed to perform CPU training/inference on RISCV64
hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 ,
this patch refactors some original code to support multiple CPU
architectures.

Related tests have passed on x86 and RISC-V CPU, and I successfully ran
Qwen2.5 on a RISC-V CPU,
```bash
(myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini
plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

(myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini
plugins: forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

```

---------

Signed-off-by: heyujiao99 <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Ma, Guokai <[email protected]>
Signed-off-by: Flakes342 <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
This patch adds riscv64 support for the deepspeed_shm_comm
operator,enabling DeepSpeed to perform CPU training/inference on RISCV64
hosts, for research purposes. Based on the discussion in pull deepspeedai#7387 ,
this patch refactors some original code to support multiple CPU
architectures.

Related tests have passed on x86 and RISC-V CPU, and I successfully ran
Qwen2.5 on a RISC-V CPU,
```bash
(myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini
plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

(myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv
====================================================================== test session starts =======================================================================
platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini
plugins: forked-1.6.0
collected 3 items

tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED                                                                              [ 33%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED                                                                              [ 66%]
tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED                                                                              [100%]

```

---------

Signed-off-by: heyujiao99 <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Ma, Guokai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants