-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add cufile to list of libraries to preload #148137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148137
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 47 PendingAs of commit 9dc23f3 with merge base fc78192 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge -f "What can go wrong" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| "cufft": "libcufft.so.*[0-9]", | ||
| "curand": "libcurand.so.*[0-9]", | ||
| "nvjitlink": "libnvJitLink.so.*[0-9]", | ||
| "cufile": "libcufile.so.*[0-9]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this!!
I don't fully understand how this bit of code works 😅 , but since we only have cufile as a dependency in cuda 12.6 and 12.8 binaries, do we need an if statement for that here?
Also do we need to check the platform is not windows (?)
Otherwise, perhaps this would break the other binaries in a similar way that happened for 2.5.0 :( #138324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikaylagawarecki It works on cuda 11.8:
Applied this patch before executing:
>>> import torch
/usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.__version__
'2.7.0.dev20250227+cu118'
I believe this patch fixes exactly this issue, for 2.5.1 cufile was not installed via pypi. This time it is, we just preload these libs from correct path.
Follow up after #148137 Make sure we don't try to load cufile on CUDA 11.8 Test: ``` >>> import torch /usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) >>> torch.__version__ '2.7.0.dev20250227+cu118' >>> ``` Pull Request resolved: #148184 Approved by: https://github.com/mikaylagawarecki
seeing ` File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 411, in <module>
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: libcufile.so.0: cannot open shared object file: No such file or directory` with arm cu128 nightly.
related to #148137
need to copy the dependency for arm build as well
Pull Request resolved: #148465
Approved by: https://github.com/atalman, https://github.com/abhilash1910
Fixes: #148120
Test with almalinux/9-base:latest :