Skip to content

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Feb 13, 2025

Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147120

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 3 Pending

As of commit 5768d7f with merge base f95bdf5 (image):
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki added a commit that referenced this pull request Feb 13, 2025
ghstack-source-id: d9b1055
Pull Request resolved: #147120
Follow up to #145748




[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki added release notes: cuda release notes category topic: new features topic category labels Feb 13, 2025
Follow up to #145748




[ghstack-poisoned]
Follow up to #145748




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Feb 13, 2025
ghstack-source-id: eba89c9
Pull Request resolved: #147120
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, only small nits



def _gds_register_buffer(s: Storage) -> None:
def gds_register_buffer(s: Storage) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any user of the old private APIs? Do you want to keep them to avoid breaking any previous user by doing something like _gds_register_buffer = gds_register_buffer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 hits on github from repos that are not just copy pastes of pytorch so I feel ok about breaking the BC here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

gds_register_buffer
gds_deregister_buffer
GdsFile

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here or link from here to an example on how to use these APIs?

Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries




[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving API to public sounds good.
Let's plan on having a tutorial (including serialization config needs) before 2.7 as an E2E example.

Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries




[ghstack-poisoned]
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Feb 13, 2025
ghstack-source-id: 359954d
Pull Request resolved: #147120
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Feb 14, 2025
ghstack-source-id: 613ad91
Pull Request resolved: #147120
@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review February 14, 2025 14:40
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 14, 2025
@mikaylagawarecki mikaylagawarecki requested review from a team, eqy and syed-ahmed as code owners February 14, 2025 14:40
@mikaylagawarecki mikaylagawarecki removed request for a team and eqy February 14, 2025 14:41
@mikaylagawarecki mikaylagawarecki removed the request for review from syed-ahmed February 14, 2025 14:41
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Feb 27, 2025
Follow up after : #147120
Cufile was enabled only on Linux: https://pypi.org/project/nvidia-cufile-cu12/#files
Fixes validation workflow failues: https://github.com/pytorch/test-infra/actions/runs/13558218752/job/37896578837

```
 File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 105, in __init__
    raise RuntimeError("GdsFile is not supported on this platform.")
RuntimeError: GdsFile is not supported on this platform.
Exception ignored in: <function GdsFile.__del__ at 0x000001772B5003A0>
Traceback (most recent call last):
  File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 113, in __del__
    if self.handle is not None:
AttributeError: 'GdsFile' object has no attribute 'handle'
```

Pull Request resolved: #148060
Approved by: https://github.com/mikaylagawarecki
aditew01 pushed a commit that referenced this pull request Feb 28, 2025
Follow up after : #147120
Cufile was enabled only on Linux: https://pypi.org/project/nvidia-cufile-cu12/#files
Fixes validation workflow failues: https://github.com/pytorch/test-infra/actions/runs/13558218752/job/37896578837

```
 File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 105, in __init__
    raise RuntimeError("GdsFile is not supported on this platform.")
RuntimeError: GdsFile is not supported on this platform.
Exception ignored in: <function GdsFile.__del__ at 0x000001772B5003A0>
Traceback (most recent call last):
  File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 113, in __del__
    if self.handle is not None:
AttributeError: 'GdsFile' object has no attribute 'handle'
```

Pull Request resolved: #148060
Approved by: https://github.com/mikaylagawarecki
@github-actions github-actions bot deleted the gh/mikaylagawarecki/313/head branch March 25, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category topic: new features topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants