Add cufile to list of libraries to preload #148137

atalman · 2025-02-28T00:01:54Z

Test with almalinux/9-base:latest :

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 401, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcufile.so.0: cannot open shared object file: No such file or directory
>>> exit()
[root@18b37257e416 /]# vi /usr/local/lib64/python3.9/site-packages/torch/__init__.py
[root@18b37257e416 /]# python3
Python 3.9.19 (main, Sep 11 2024, 00:00:00) 
[GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
/usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.__version__
'2.7.0.dev20250227+cu126'

pytorch-bot · 2025-02-28T00:01:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148137

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Long queue for macOS runners

⏳ No Failures, 47 Pending

As of commit 9dc23f3 with merge base fc78192 ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / unstable-linux-focal-cuda12.4-py3.10-gcc9-sm89-xfail / build (gh) (#147642)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet · 2025-02-28T00:34:05Z

@pytorchbot merge -f "What can go wrong"

pytorchmergebot · 2025-02-28T00:35:33Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mikaylagawarecki · 2025-02-28T00:42:44Z

torch/__init__.py

            "cufft": "libcufft.so.*[0-9]",
            "curand": "libcurand.so.*[0-9]",
            "nvjitlink": "libnvJitLink.so.*[0-9]",
+            "cufile": "libcufile.so.*[0-9]",


Thank you for fixing this!!

I don't fully understand how this bit of code works 😅 , but since we only have cufile as a dependency in cuda 12.6 and 12.8 binaries, do we need an if statement for that here?

Also do we need to check the platform is not windows (?)

Otherwise, perhaps this would break the other binaries in a similar way that happened for 2.5.0 :( #138324

@mikaylagawarecki It works on cuda 11.8:

Applied this patch before executing:

>>> import torch /usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) >>> torch.__version__ '2.7.0.dev20250227+cu118'

I believe this patch fixes exactly this issue, for 2.5.1 cufile was not installed via pypi. This time it is, we just preload these libs from correct path.

Follow up after #148137 Make sure we don't try to load cufile on CUDA 11.8 Test: ``` >>> import torch /usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) >>> torch.__version__ '2.7.0.dev20250227+cu118' >>> ``` Pull Request resolved: #148184 Approved by: https://github.com/mikaylagawarecki

seeing ` File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 411, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libcufile.so.0: cannot open shared object file: No such file or directory` with arm cu128 nightly. related to #148137 need to copy the dependency for arm build as well Pull Request resolved: #148465 Approved by: https://github.com/atalman, https://github.com/abhilash1910

Add cufile to list of libraries to preload

9dc23f3

malfet approved these changes Feb 28, 2025

View reviewed changes

malfet added release notes: releng release notes category topic: bug fixes topic category labels Feb 28, 2025

pytorchmergebot added the merging label Feb 28, 2025

pytorchmergebot added the Merged label Feb 28, 2025

pytorchmergebot closed this in 5a14ff8 Feb 28, 2025

pytorchmergebot removed the merging label Feb 28, 2025

mikaylagawarecki reviewed Feb 28, 2025

View reviewed changes

atalman mentioned this pull request Feb 28, 2025

Add cuda 11.8 guard for cufile preload #148184

Closed

tinglvv mentioned this pull request Mar 4, 2025

[aarch64] add libcufile for cu126 and cu128 #148465

Closed

github-actions bot deleted the atalman-patch-9 branch March 30, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cufile to list of libraries to preload #148137

Add cufile to list of libraries to preload #148137

Uh oh!

atalman commented Feb 28, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 28, 2025 •

edited

Loading

Uh oh!

malfet commented Feb 28, 2025

Uh oh!

pytorchmergebot commented Feb 28, 2025

Uh oh!

mikaylagawarecki Feb 28, 2025 •

edited

Loading

Uh oh!

atalman Feb 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add cufile to list of libraries to preload #148137

Add cufile to list of libraries to preload #148137

Uh oh!

Conversation

atalman commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148137

❗ 1 Active SEVs

⏳ No Failures, 47 Pending

Uh oh!

malfet commented Feb 28, 2025

Uh oh!

pytorchmergebot commented Feb 28, 2025

Merge started

Uh oh!

mikaylagawarecki Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atalman Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

atalman commented Feb 28, 2025 •

edited

Loading

pytorch-bot bot commented Feb 28, 2025 •

edited

Loading

mikaylagawarecki Feb 28, 2025 •

edited

Loading

atalman Feb 28, 2025 •

edited

Loading