Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162351
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f8b1b0c with merge base 991e3d0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
need to update it more places |
|
Change looks good, would need to upload the NCCL packages to download.pytorch.org (eg https://download.pytorch.org/whl/nightly/nvidia-nccl-cu13/) and get signals on ciflow/binaries before we merge |
2a11c8e to
15a3d77
Compare
@ezyang, ah you meant the workflows. Fixed those. |
15a3d77 to
f8b1b0c
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices. Pull Request resolved: #162351 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/atalman
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices. Pull Request resolved: pytorch#162351 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/atalman
|
This update breaks all nccl ops on H100 with "no kernel image available" on cuda 12.9. Note we cannot use 12.8 for reasons, and cannot use 13.0 because our driver version is insufficient, so 12.9 is the only option |
|
@pytorchbot revert -m "Broke H100 on 12.9" -c nosignal Reverting out of caution as H100 is very widely used across our userbase. |
|
@pytorchbot successfully started a revert job. Check the current status here. |
Reverting PR 162351 failedReason: Command Details for Dev Infra teamRaised by workflow job |
Revert #162351 as it breaks H100 Pull Request resolved: #164352 Approved by: https://github.com/atalman, https://github.com/malfet
|
@albanD A quick check does not show missing architectures. We'll follow up on Slack to get a repro |
|
Discussed offline, it's a nccl back where it incorrectly handles error propagation for static linking situation. Static linking is used for local source builds. |
Revert pytorch#162351 as it breaks H100 Pull Request resolved: pytorch#164352 Approved by: https://github.com/atalman, https://github.com/malfet
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices.