[SymmetricMemoryOps] use float32 as the accumulator type when accumulating bfloat16 with multimem.ld_reduce #137529

yifuwang · 2024-10-08T21:29:37Z

Stack from ghstack (oldest at bottom):

This provides better accuracy without additional cost.

Also added documentation to multimem_one_shot_all_reduce to note the numerical caveats.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

…ating bfloat16 with multimem.ld_reduce This provides better accuracy without additional cost. [ghstack-poisoned]

pytorch-bot · 2024-10-08T21:29:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137529

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0603ed5 with merge base 9b2e453 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

- Previously the detection would fail before user calling APIs such as `torch.cuda.set_device()`. This is because the detection logic requires nvml initialization. In this PR, we added explicit nvml initialization (which idempotent). - Previously any nvml issue occurred in the detection logic would result in fatal error. Now we issue an informative warning and return a topology assuming no NVLink connectivity. Pull Request resolved: #137530 Approved by: https://github.com/Chillee ghstack dependencies: #137471, #137472, #137473, #137474, #137475, #137529

…ating bfloat16 with multimem.ld_reduce This provides better accuracy without additional cost. ghstack-source-id: 7b8c55d Pull Request resolved: pytorch#137529

[SymmetricMemoryOps] use float32 as the accumulator type when accumul…

0603ed5

…ating bfloat16 with multimem.ld_reduce This provides better accuracy without additional cost. [ghstack-poisoned]

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Oct 8, 2024

yifuwang requested review from Chillee and weifengpy October 8, 2024 21:30

Chillee approved these changes Oct 8, 2024

View reviewed changes

yifuwang mentioned this pull request Oct 9, 2024

[SymmetricMemory] implement timeout for barrier(), put_signal() and wait_signal() #137643

Closed

pytorchmergebot added the Merged label Oct 9, 2024

pytorchmergebot closed this in fbaf9b6 Oct 9, 2024

yifuwang mentioned this pull request Oct 10, 2024

[fused_scaled_matmul_reduce_scatter] support rowwise scaling #137738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SymmetricMemoryOps] use float32 as the accumulator type when accumulating bfloat16 with multimem.ld_reduce #137529

[SymmetricMemoryOps] use float32 as the accumulator type when accumulating bfloat16 with multimem.ld_reduce #137529

Uh oh!

yifuwang commented Oct 8, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SymmetricMemoryOps] use float32 as the accumulator type when accumulating bfloat16 with multimem.ld_reduce #137529

[SymmetricMemoryOps] use float32 as the accumulator type when accumulating bfloat16 with multimem.ld_reduce #137529

Uh oh!

Conversation

yifuwang commented Oct 8, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137529

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yifuwang commented Oct 8, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading