Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Oct 10, 2024

Stack from ghstack (oldest at bottom):

Motivation

fix #137694, libtroch_cuda.so is missing in TORCH_LIBRARIES and TORCH_CUDA_LIBRARIES in PyTorch CMake

If build flag BUILD_SHARED_LIBS is disabled, libtorch_cuda.so will be added into TORCH_LIBRARIES here
If build flag BUILD_SHARED_LIBS is enabled, libtorch_cuda,so is missing here

Additional Context

find_library(TORCH_CUDA_LIBRARY intends to export libtorch_cuda.so to TORCH_LIBRARIES.

Why do we prefer find_library(TORCH_CUDA_LIBRARY instead of append_torchlib_if_found?
append_torchlib_if_found doesn't update TORCH_CUDA_LIBRARIES. And we expect libtorch_cuda.so to be added to both TORCH_CUDA_LIBRARIES and TORCH_LIBRARIES to facilitate user experience.

cc @malfet @seemethere @ptrblck @msaroufim

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137693

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ab3b81d with merge base 5c3ba6f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: 1ad886e
Pull Request resolved: #137693
@guangyey guangyey requested a review from malfet October 10, 2024 14:47
@guangyey guangyey added the release notes: cuda release notes category label Oct 10, 2024
guangyey added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: 1ad886e
Pull Request resolved: #137693
guangyey added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: 1ad886e
Pull Request resolved: #137693
guangyey added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: a0aa4ae
Pull Request resolved: #137693
guangyey added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: 1ad886e
Pull Request resolved: #137693
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general labels Oct 11, 2024
@guangyey guangyey added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 15, 2024
guangyey added a commit that referenced this pull request Oct 15, 2024
ghstack-source-id: bae6933
Pull Request resolved: #137693
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

@malfet could you help review this PR?

@guangyey
Copy link
Collaborator Author

@malfet Sorry for this. Just a reminder in case this series PR is ignored.

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 5, 2024

@malfet May I know if this code change is reasonable to you?

@malfet
Copy link
Contributor

malfet commented Nov 5, 2024

@malfet May I know if this code change is reasonable to you?

I still don't understand, what problem this PR is trying to solve? Can you please add a regression test or something? TORCH_LIBRARY should only contain reference to libtorch, which is a proxy library for both torch_cpu and torch_cuda. Why one needs to add explicit dependency to libtorch_cuda.so? When depending on libtorch.so is not enough?

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 5, 2024

I go through the CMake code again. In my understanding, these pieces of code seem to find all libs related to torch such as torch_cpu, torch_cuda, c10, c10_cuda, and torch to TORCH_LIBRARY, and even nnpack, mkldnn, kineto these libs that PyTorch depends on. In most situations, libtorch.so, a proxy lib, is enough to find the dependency, but sometimes might be over-enough.
This PR intends to add torch_cuda to TORCH_LIBRARY in BUILD_SHARED_LIBS mode to flexibly facilitate the user experience in the situation where the user only wants to link torch_cuda. I could take only link torch_cpu as an example: ipex CPU code only would like to link libtorch_cpu rather than libtorch_cpu and libtorch_cuda.
If this PR code change is not reasonable to you, I would like to close it.

@malfet
Copy link
Contributor

malfet commented Nov 7, 2024

If this PR code change is not reasonable to you, I would like to close it.

There are two use-cases in my mind:

  • C++ API users, who should never link with anything directly against libtorch_cuda or c10 or kineto, as implementation of all methods decorated with TORCH_API should be in libtorch_cpu.so
  • Extension developers who might need to link to kineto, c10_cuda, cudnn etc should write their scripts in a way to explicitly search for those dependencies, as pytorch might change the underlying structure any time, but if they need something from the 3rd party libraries, they should not assume they'll be available thru PyTorch

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 8, 2024

Do you mean if C++ user would like to use some APIs provided by libtorch_cuda.so such as getCUDADeviceAllocator. It is enough to link against libtorch.so rather than libtorch_cuda.so. If yes, I think you are right!

@guangyey guangyey closed this Nov 25, 2024
@github-actions github-actions bot deleted the gh/guangyey/72/head branch December 26, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general open source release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants