-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Use pytorch utils to detect ninja #7687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pytorch utils to detect ninja #7687
Conversation
Signed-off-by: Tim Adler <[email protected]>
1a7f682 to
e826740
Compare
|
To add, not only this dependency creates complexity to these environments, it is a blocker for creating standalone conda packages depending on DS - for example for openfold-3, as described in the issue. Thanks a lot for your consideration. |
|
@Emrys-Merlin thanks for this contribution. What is the earliest pytorch version can detect ninja? |
|
I have looked a bit and at least it is in old pytorch versions (1.8.1 and likely earlier). Does deepspeed aims at supporting very old versions of pytorch? I think to be in the safe side we should then add this simple logic here, maybe completely replacing using pytorch's or as a fallback for old versions if import fails. What do you think? |
|
@sdvillal thanks for the clarification. No, we don't plan to support very old pytorch versions. This addresses my concerns. |
This PR replaces the ninja detection based on `import ninja` with the corresponding pytorch utils to detect ninja. The motivation behind this change is twofold. 1. The detection based on importing the python interface works in the pip-world, but fails in the conda-world, as the corresponding ninja conda package does not include the python interface. The implication on detection is [known](conda-forge/ninja-feedstock#26) and the recommended solution is to use the `subprocess` module to detect ninja. This approach is followed by, e.g., [meson](https://github.com/mesonbuild/meson-python/blob/1c8092dc477cbc7e1e4d40913608d9daae75f793/mesonpy/__init__.py#L1077-L1088) and [pytorch](https://github.com/pytorch/pytorch/blob/d33d125c9413c5043aa5f74fad909a576288242d/torch/utils/cpp_extension.py#L2312-L2325). As the `subprocess`-based check works in both the pip- and the conda-world, I think, it would make sense to switch over. 2. As ninja is only invoked through pytorch, it should be sufficient to check if pytorch can find ninja. Hence, instead of reimplementing the `subprocess` check, I think it would be better to use the pytorch utils (which implement said test anyways). Without this refactor, every conda environment that depends on DeepSpeed will need to install ninja as a PyPI dependency, even though the conda version of ninja would be sufficient for the compilation. In my opinion, this adds unnecessary complexity to these environments. I tried to keep the changes minimal. As some additional context, @sdvillal and I stumbled over this issue while working on packaging aqlaboratory/openfold-3 for conda-forge. Signed-off-by: Tim Adler <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: rraminen <[email protected]>
This PR replaces the ninja detection based on
import ninjawith the corresponding pytorch utils to detect ninja.The motivation behind this change is twofold.
subprocessmodule to detect ninja. This approach is followed by, e.g., meson and pytorch. As thesubprocess-based check works in both the pip- and the conda-world, I think, it would make sense to switch over.subprocesscheck, I think it would be better to use the pytorch utils (which implement said test anyways).Without this refactor, every conda environment that depends on DeepSpeed will need to install ninja as a PyPI dependency, even though the conda version of ninja would be sufficient for the compilation. In my opinion, this adds unnecessary complexity to these environments.
I tried to keep the changes minimal.
As some additional context, @sdvillal and I stumbled over this issue while working on packaging aqlaboratory/openfold-3 for conda-forge.