-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Support generic stream/event on CUDA/HIP backend #125757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125757
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c530269 with merge base fcbf2b6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
[ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| "Both events must be recorded before calculating elapsed time."); | ||
| cudaEvent_t cuda_event1 = static_cast<cudaEvent_t>(event1); | ||
| cudaEvent_t cuda_event2 = static_cast<cudaEvent_t>(event2); | ||
| float time_ms = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the device guard like
pytorch/aten/src/ATen/cuda/CUDAEvent.h
Line 157 in 1ecea51
| CUDAGuard guard(device_index_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. We also provide support on HIP backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add get/set calls here and not DeviceGuard guard(Device(c10::kCUDA, device_index)); ?
Not sure about the exact arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, CUDAGuardImpl is an implementation of DeviceGuard. DeviceGuard is high-level than CUDAGuardImpl. So it is unreasonable to use high-level code(DeviceGuard) in low-level code(CUDAGuardImpl).
If you prefer to use DeviceGuard, I can prepare a PR to refine it.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
jgong5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failing?
| device->type(), | ||
| (enable_timing ? c10::EventFlag::PYTORCH_DEFAULT | ||
| : c10::EventFlag::BACKEND_DEFAULT)); | ||
| // See note [Flags defining the behavior of events] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug fix
# Motivation According to [#123611](#123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize [ghstack-poisoned]
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
# Motivation According to [#123611](#123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize [ghstack-poisoned]
# Motivation According to [#123611](#123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize [ghstack-poisoned]
# Motivation According to [#123611](#123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| HIPCachingAllocatorMasqueradingAsCUDA::recordStreamMasqueradingAsCUDA(data_ptr, hip_stream); | ||
| } | ||
|
|
||
| double elapsedTime(void* event1, void* event2, const DeviceIndex device_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please make sure the relevant submodule owner has a chance to review the change before we merge it in other submodules?
cc @jeffdaily does this timing code looks good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing jumps out as wrong. I'll lean on CI to tell the truth :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please make sure the relevant submodule owner has a chance to review the change before we merge it in other submodules?
cc @jeffdaily does this timing code looks good?
Sorry about that..., I will inform you before merging PR, if there is a big code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refer to destroyEvent code to guard it not to crate a new cuda context here.
pytorch/c10/cuda/impl/CUDAGuardImpl.h
Line 106 in b24ad7e
| void destroyEvent(void* event, const DeviceIndex device_index) |
# Motivation According to [#123611](#123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Motivation
According to #123611, we support generic stream/event on CUDA backend.
Additional Context
new method/attribute on
torch.Eventfor cudanew method on
c10::Eventon cuda backend