Skip to content

Conversation

@Emrys-Merlin
Copy link
Contributor

This PR replaces the ninja detection based on import ninja with the corresponding pytorch utils to detect ninja.

The motivation behind this change is twofold.

  1. The detection based on importing the python interface works in the pip-world, but fails in the conda-world, as the corresponding ninja conda package does not include the python interface. The implication on detection is known and the recommended solution is to use the subprocess module to detect ninja. This approach is followed by, e.g., meson and pytorch. As the subprocess-based check works in both the pip- and the conda-world, I think, it would make sense to switch over.
  2. As ninja is only invoked through pytorch, it should be sufficient to check if pytorch can find ninja. Hence, instead of reimplementing the subprocess check, I think it would be better to use the pytorch utils (which implement said test anyways).

Without this refactor, every conda environment that depends on DeepSpeed will need to install ninja as a PyPI dependency, even though the conda version of ninja would be sufficient for the compilation. In my opinion, this adds unnecessary complexity to these environments.

I tried to keep the changes minimal.

As some additional context, @sdvillal and I stumbled over this issue while working on packaging aqlaboratory/openfold-3 for conda-forge.

@Emrys-Merlin Emrys-Merlin force-pushed the refactor/ninja_discovery branch from 1a7f682 to e826740 Compare November 12, 2025 13:15
@sdvillal
Copy link

To add, not only this dependency creates complexity to these environments, it is a blocker for creating standalone conda packages depending on DS - for example for openfold-3, as described in the issue.

Thanks a lot for your consideration.

@sfc-gh-truwase
Copy link
Collaborator

@Emrys-Merlin thanks for this contribution. What is the earliest pytorch version can detect ninja?

@sdvillal
Copy link

I have looked a bit and at least it is in old pytorch versions (1.8.1 and likely earlier).

Does deepspeed aims at supporting very old versions of pytorch? I think to be in the safe side we should then add this simple logic here, maybe completely replacing using pytorch's or as a fallback for old versions if import fails.

What do you think?

@sfc-gh-truwase
Copy link
Collaborator

@sdvillal thanks for the clarification. No, we don't plan to support very old pytorch versions. This addresses my concerns.

@sfc-gh-truwase sfc-gh-truwase enabled auto-merge (squash) November 12, 2025 22:45
@sfc-gh-truwase sfc-gh-truwase merged commit 4cb26ca into deepspeedai:master Nov 12, 2025
13 checks passed
rraminen pushed a commit to rraminen/DeepSpeed that referenced this pull request Dec 1, 2025
This PR replaces the ninja detection based on `import ninja` with the
corresponding pytorch utils to detect ninja.

The motivation behind this change is twofold.

1. The detection based on importing the python interface works in the
pip-world, but fails in the conda-world, as the corresponding ninja
conda package does not include the python interface. The implication on
detection is
[known](conda-forge/ninja-feedstock#26) and
the recommended solution is to use the `subprocess` module to detect
ninja. This approach is followed by, e.g.,
[meson](https://github.com/mesonbuild/meson-python/blob/1c8092dc477cbc7e1e4d40913608d9daae75f793/mesonpy/__init__.py#L1077-L1088)
and
[pytorch](https://github.com/pytorch/pytorch/blob/d33d125c9413c5043aa5f74fad909a576288242d/torch/utils/cpp_extension.py#L2312-L2325).
As the `subprocess`-based check works in both the pip- and the
conda-world, I think, it would make sense to switch over.
2. As ninja is only invoked through pytorch, it should be sufficient to
check if pytorch can find ninja. Hence, instead of reimplementing the
`subprocess` check, I think it would be better to use the pytorch utils
(which implement said test anyways).

Without this refactor, every conda environment that depends on DeepSpeed
will need to install ninja as a PyPI dependency, even though the conda
version of ninja would be sufficient for the compilation. In my opinion,
this adds unnecessary complexity to these environments.

I tried to keep the changes minimal.

As some additional context, @sdvillal and I stumbled over this issue
while working on packaging aqlaboratory/openfold-3 for conda-forge.

Signed-off-by: Tim Adler <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: rraminen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants