Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented May 8, 2024

Stack from ghstack (oldest at bottom):

Motivation

According to #123611, we support generic stream/event on CUDA backend.

Additional Context

new method/attribute on torch.Event for cuda

  • torch.Event.event_id
  • torch.Event.elapsed_time
  • torch.Event.synchronize

new method on c10::Event on cuda backend

  • c10.Event.event_id
  • c10.Event.elapsed_time
  • c10.Event.synchronize

@pytorch-bot
Copy link

pytorch-bot bot commented May 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125757

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c530269 with merge base fcbf2b6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey changed the title Support generic stream/event on CUDA backend [WIP] Support generic stream/event on CUDA backend May 8, 2024
guangyey added a commit that referenced this pull request May 8, 2024
ghstack-source-id: 944cb60
Pull Request resolved: #125757
@guangyey guangyey changed the title [WIP] Support generic stream/event on CUDA backend Support generic stream/event on CUDA backend May 9, 2024
@guangyey guangyey added ciflow/trunk Trigger trunk jobs on your pull request topic: improvements topic category labels May 9, 2024
guangyey added a commit that referenced this pull request May 9, 2024
ghstack-source-id: 7976f76
Pull Request resolved: #125757
guangyey added a commit that referenced this pull request May 9, 2024
ghstack-source-id: 9580e31
Pull Request resolved: #125757
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

"Both events must be recorded before calculating elapsed time.");
cudaEvent_t cuda_event1 = static_cast<cudaEvent_t>(event1);
cudaEvent_t cuda_event2 = static_cast<cudaEvent_t>(event2);
float time_ms = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the device guard like

CUDAGuard guard(device_index_);
here? Looks important.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. We also provide support on HIP backend.

Copy link
Collaborator

@albanD albanD May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add get/set calls here and not DeviceGuard guard(Device(c10::kCUDA, device_index)); ?

Not sure about the exact arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, CUDAGuardImpl is an implementation of DeviceGuard. DeviceGuard is high-level than CUDAGuardImpl. So it is unreasonable to use high-level code(DeviceGuard) in low-level code(CUDAGuardImpl).
If you prefer to use DeviceGuard, I can prepare a PR to refine it.

Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failing?

device->type(),
(enable_timing ? c10::EventFlag::PYTORCH_DEFAULT
: c10::EventFlag::BACKEND_DEFAULT));
// See note [Flags defining the behavior of events]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label May 10, 2024
guangyey added a commit that referenced this pull request May 10, 2024
ghstack-source-id: 03211a8
Pull Request resolved: #125757
@guangyey guangyey changed the title Support generic stream/event on CUDA backend Support generic stream/event on CUDA/HIP backend May 10, 2024
guangyey added a commit that referenced this pull request May 10, 2024
ghstack-source-id: f608eb4
Pull Request resolved: #125757
@guangyey guangyey requested a review from jgong5 May 10, 2024 08:45
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@guangyey guangyey added the topic: new features topic category label May 10, 2024
@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

guangyey added 2 commits May 10, 2024 12:44
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
@guangyey guangyey added the release notes: cuda release notes category label May 10, 2024
@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

HIPCachingAllocatorMasqueradingAsCUDA::recordStreamMasqueradingAsCUDA(data_ptr, hip_stream);
}

double elapsedTime(void* event1, void* event2, const DeviceIndex device_index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please make sure the relevant submodule owner has a chance to review the change before we merge it in other submodules?

cc @jeffdaily does this timing code looks good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing jumps out as wrong. I'll lean on CI to tell the truth :-).

Copy link
Collaborator Author

@guangyey guangyey May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please make sure the relevant submodule owner has a chance to review the change before we merge it in other submodules?

cc @jeffdaily does this timing code looks good?

Sorry about that..., I will inform you before merging PR, if there is a big code change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refer to destroyEvent code to guard it not to crate a new cuda context here.

void destroyEvent(void* event, const DeviceIndex device_index)

# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
@github-actions github-actions bot deleted the gh/guangyey/29/head branch June 11, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source release notes: cuda release notes category topic: improvements topic category topic: new features topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants