Remove +PTX from CUDA 12.8 builds #157634

pytorchbot · 2025-07-04T18:15:04Z

Remove +PTX from CUDA 12.8 builds and small refactor in build_cuda.sh.
Removing +PTX reduces binary size required to be able to upload binaries to pypi

Remove +PTX from CUDA 12.8 builds and small refactor in build_cuda.sh. Removing +PTX reduces binary size required to be able to upload binaries to pypi Pull Request resolved: #157516 Approved by: https://github.com/malfet, https://github.com/ptrblck, https://github.com/tinglvv (cherry picked from commit 8408522)

pytorch-bot · 2025-07-04T18:15:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157634

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 54 Pending

As of commit 2d381f1 with merge base 3a7ff82 ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

atalman

lgtm

…57791) NVCC apparently has a [compression-mode flag](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#compress-mode-default-size-speed-balance-none-compress-mode) to tell it how you want to compress the fatbinary since 12.4. This mode defaults to speed (pick a low compression mode that loads the file quickly). Since we are running into PyPi size issues, this will allow us to upload smaller wheel files. From: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compress-mode-default-size-speed-balance-none-compress-mode ``` size Uses a compression mode more focused on reduced binary size, at the cost of compression and decompression time. ``` Up to 37.2% reduction in binary size with virtually no drawback (except potentially a little slower loading of the .so at PyTorch startup). 694 MB for CUDA 12.9 builds with 6.0;7.0;7.5;8.0;8.6;9.0;10.0;12.0+PTX vs 1.08GB for CUDA 12.9 builds with 7.5;8.0;8.6;9.0;10.0;12.0+PTX CUDA 12.9 ***694MB*** vs ***1.08GB*** CUDA 12.8 ***604MB*** vs ***845MB*** This ends up saving PyPi.org approximately 19.6 PiB of bandwidth per month for the CUDA 12.9 case. This will also allow us to add back CUDA 12.8 12.0+PTX which will make the package forward compatible on newer GPUs. Undoing the need for PR #157516 and #157634 <img alt="Screenshot 2025-07-08 at 5 36 44 PM" width="1061" src="https://private-user-images.githubusercontent.com/7563158/463890713-a53ec774-b036-4c0b-a5d5-301756e3644f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTIwNzY3OTIsIm5iZiI6MTc1MjA3NjQ5MiwicGF0aCI6Ii83NTYzMTU4LzQ2Mzg5MDcxMy1hNTNlYzc3NC1iMDM2LTRjMGItYTVkNS0zMDE3NTZlMzY0NGYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDcwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA3MDlUMTU1NDUyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Yzg1OGExN2VjYmI3ZDFhNjIwZDk0NTBjOWFlZDIzYzY3MmExYTFiOGZhZjc0NTI1ZTk2YzM3YzdhYzkyYzZlMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.2-YmmfXrBFuXCrjDCQ_iTgbtbwv9xNFqM6Goc_liDKE"> More details can be found in Nvidia's technical blog for CUDA 12.4: https://developer.nvidia.com/blog/runtime-fatbin-creation-using-the-nvidia-cuda-toolkit-12-4-compiler/ Pull Request resolved: #157791 Approved by: https://github.com/malfet, https://github.com/atalman

pytorchbot requested a review from a team as a code owner July 4, 2025 18:15

pytorchbot mentioned this pull request Jul 4, 2025

[v.2.8.0] Release Tracker #156745

Open

pytorchbot mentioned this pull request Jul 4, 2025

Remove +PTX from CUDA 12.8 builds #157516

Closed

pytorchbot added the open source label Jul 4, 2025

atalman approved these changes Jul 4, 2025

View reviewed changes

atalman merged commit 3db12a5 into release/2.8 Jul 4, 2025
47 of 98 checks passed

Skylion007 mentioned this pull request Jul 10, 2025

[BE]: Reduce binary size 40% using aggressive fatbin compression. #157791

Closed

atalman mentioned this pull request Jul 23, 2025

Release 2.8.0 validations checklist and cherry-picks #158939

Closed

79 tasks

github-actions bot deleted the cherry-pick-157516-by-pytorch_bot_bot_ branch August 4, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove +PTX from CUDA 12.8 builds #157634

Remove +PTX from CUDA 12.8 builds #157634

Uh oh!

pytorchbot commented Jul 4, 2025

Uh oh!

pytorch-bot bot commented Jul 4, 2025 •

edited

Loading

Uh oh!

atalman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove +PTX from CUDA 12.8 builds #157634

Remove +PTX from CUDA 12.8 builds #157634

Uh oh!

Conversation

pytorchbot commented Jul 4, 2025

Uh oh!

pytorch-bot bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157634

⏳ No Failures, 54 Pending

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jul 4, 2025 •

edited

Loading