-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[c10d] Add NCCL memory allocator #145675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c10d] Add NCCL memory allocator #145675
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145675
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 37 PendingAs of commit d310c91 with merge base d79c6f4 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
test/distributed/test_c10d_nccl.py
Outdated
| pool = torch.cuda.MemPool(allocator.allocator()) | ||
|
|
||
| # Use NCCL memory allocator | ||
| allocator = c10d.nccl_mem_allocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this generic? (for *ccl vendors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should. Maybe later :) (More framework design needed)
|
the API looks nice. I have 2 high level questions about the direction
|
syed-ahmed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR implements a small UI improvement over #133603. It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it. UI: ``` pool = torch.cuda.MemPool(dist.nccl_mem_allocator) with torch.cuda.use_mem_pool(pool): tensor = torch.arange(1024 * 1024 * 2, device=device) ``` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
wconstab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! This seems good to me. It might be enough just as it is. If we eventually want to make a smoother API for getting an allocator, i think we could build that strictly in the python layer later on and use these apis under the hood.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -c ghfirst -m "Sorry but this still fails internally. See D68866823 for details" |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@kwen2501 your PR has been successfully reverted. |
This reverts commit 18a7a04. Reverted #145675 on behalf of https://github.com/ZainRizvi due to Sorry but this still fails internally. See D68866823 for details ([comment](#145675 (comment)))
This PR implements a small UI improvement over #133603. It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it. UI: ``` pool = torch.cuda.MemPool(backend.mem_allocator) with torch.cuda.use_mem_pool(pool): tensor = torch.arange(1024 * 1024 * 2, device=device) ``` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
|
@ZainRizvi Arg, sorry I forgot to add an |
|
@pytorchbot merge -f "Adding #else to fix internal compilation error" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
This PR implements a small UI improvement over #133603.
It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it.
UI:
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o