Use pytorch utils to detect ninja #7687

Emrys-Merlin · 2025-11-12T13:12:53Z

This PR replaces the ninja detection based on import ninja with the corresponding pytorch utils to detect ninja.

The motivation behind this change is twofold.

The detection based on importing the python interface works in the pip-world, but fails in the conda-world, as the corresponding ninja conda package does not include the python interface. The implication on detection is known and the recommended solution is to use the subprocess module to detect ninja. This approach is followed by, e.g., meson and pytorch. As the subprocess-based check works in both the pip- and the conda-world, I think, it would make sense to switch over.
As ninja is only invoked through pytorch, it should be sufficient to check if pytorch can find ninja. Hence, instead of reimplementing the subprocess check, I think it would be better to use the pytorch utils (which implement said test anyways).

Without this refactor, every conda environment that depends on DeepSpeed will need to install ninja as a PyPI dependency, even though the conda version of ninja would be sufficient for the compilation. In my opinion, this adds unnecessary complexity to these environments.

I tried to keep the changes minimal.

As some additional context, @sdvillal and I stumbled over this issue while working on packaging aqlaboratory/openfold-3 for conda-forge.

Signed-off-by: Tim Adler <[email protected]>

sdvillal · 2025-11-12T15:12:53Z

To add, not only this dependency creates complexity to these environments, it is a blocker for creating standalone conda packages depending on DS - for example for openfold-3, as described in the issue.

Thanks a lot for your consideration.

sfc-gh-truwase · 2025-11-12T19:02:28Z

@Emrys-Merlin thanks for this contribution. What is the earliest pytorch version can detect ninja?

sdvillal · 2025-11-12T22:25:48Z

I have looked a bit and at least it is in old pytorch versions (1.8.1 and likely earlier).

Does deepspeed aims at supporting very old versions of pytorch? I think to be in the safe side we should then add this simple logic here, maybe completely replacing using pytorch's or as a fallback for old versions if import fails.

What do you think?

sfc-gh-truwase · 2025-11-12T22:43:59Z

@sdvillal thanks for the clarification. No, we don't plan to support very old pytorch versions. This addresses my concerns.

@sdvillal

This PR replaces the ninja detection based on `import ninja` with the corresponding pytorch utils to detect ninja. The motivation behind this change is twofold. 1. The detection based on importing the python interface works in the pip-world, but fails in the conda-world, as the corresponding ninja conda package does not include the python interface. The implication on detection is [known](conda-forge/ninja-feedstock#26) and the recommended solution is to use the `subprocess` module to detect ninja. This approach is followed by, e.g., [meson](https://github.com/mesonbuild/meson-python/blob/1c8092dc477cbc7e1e4d40913608d9daae75f793/mesonpy/__init__.py#L1077-L1088) and [pytorch](https://github.com/pytorch/pytorch/blob/d33d125c9413c5043aa5f74fad909a576288242d/torch/utils/cpp_extension.py#L2312-L2325). As the `subprocess`-based check works in both the pip- and the conda-world, I think, it would make sense to switch over. 2. As ninja is only invoked through pytorch, it should be sufficient to check if pytorch can find ninja. Hence, instead of reimplementing the `subprocess` check, I think it would be better to use the pytorch utils (which implement said test anyways). Without this refactor, every conda environment that depends on DeepSpeed will need to install ninja as a PyPI dependency, even though the conda version of ninja would be sufficient for the compilation. In my opinion, this adds unnecessary complexity to these environments. I tried to keep the changes minimal. As some additional context, @sdvillal and I stumbled over this issue while working on packaging aqlaboratory/openfold-3 for conda-forge. Signed-off-by: Tim Adler <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: rraminen <[email protected]>

Emrys-Merlin requested review from jomayeri, loadams and tjruwase as code owners November 12, 2025 13:12

Use pytorch utils to detect ninja

e826740

Signed-off-by: Tim Adler <[email protected]>

Emrys-Merlin force-pushed the refactor/ninja_discovery branch from 1a7f682 to e826740 Compare November 12, 2025 13:15

Merge branch 'master' into refactor/ninja_discovery

0626d2d

sfc-gh-truwase approved these changes Nov 12, 2025

View reviewed changes

sfc-gh-truwase enabled auto-merge (squash) November 12, 2025 22:45

sfc-gh-truwase merged commit 4cb26ca into deepspeedai:master Nov 12, 2025
13 checks passed

Emrys-Merlin mentioned this pull request Nov 13, 2025

Modernize conda environment aqlaboratory/openfold-3#34

Draft

sdvillal mentioned this pull request Dec 30, 2025

Deepspeed v0.18.3 conda-forge/deepspeed-feedstock#102

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use pytorch utils to detect ninja #7687

Use pytorch utils to detect ninja #7687

Uh oh!

Emrys-Merlin commented Nov 12, 2025

Uh oh!

sdvillal commented Nov 12, 2025

Uh oh!

sfc-gh-truwase commented Nov 12, 2025

Uh oh!

sdvillal commented Nov 12, 2025

Uh oh!

sfc-gh-truwase commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use pytorch utils to detect ninja #7687

Use pytorch utils to detect ninja #7687

Uh oh!

Conversation

Emrys-Merlin commented Nov 12, 2025

Uh oh!

sdvillal commented Nov 12, 2025

Uh oh!

sfc-gh-truwase commented Nov 12, 2025

Uh oh!

sdvillal commented Nov 12, 2025

Uh oh!

sfc-gh-truwase commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants