Skip to content

Conversation

…ating bfloat16 with multimem.ld_reduce

This provides better accuracy without additional cost.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137529

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0603ed5 with merge base 9b2e453 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchmergebot pushed a commit that referenced this pull request Oct 9, 2024
- Previously the detection would fail before user calling APIs such as `torch.cuda.set_device()`. This is because the detection logic requires nvml initialization. In this PR, we added explicit nvml initialization (which idempotent).
- Previously any nvml issue occurred in the detection logic would result in fatal error. Now we issue an informative warning and return a topology assuming no NVLink connectivity.

Pull Request resolved: #137530
Approved by: https://github.com/Chillee
ghstack dependencies: #137471, #137472, #137473, #137474, #137475, #137529
yifuwang pushed a commit to yifuwang/pytorch that referenced this pull request Feb 22, 2025
…ating bfloat16 with multimem.ld_reduce

This provides better accuracy without additional cost.

ghstack-source-id: 7b8c55d
Pull Request resolved: pytorch#137529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants