Skip to content

Conversation

@egienvalue
Copy link
Contributor

@egienvalue egienvalue commented Apr 9, 2024

Stack from ghstack (oldest at bottom):

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.

torch.Stream APIs

# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    @overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    @overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...

torch.Event APIs:

  • IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
  • currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
  • elapsedTime API is added to c10::Event
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    @classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...

c10::Event provides new APIs

  • calculate elapsedTime.
  • Get raw event id
  • Synchronize event.
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }

TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: D56443357

…m/Event

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    @overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    @overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    @classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@egienvalue egienvalue requested a review from albanD as a code owner April 9, 2024 01:19
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123611

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a7a1504 with merge base c5fafe9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great PR description!
This looks quite good.
I think you can test it with cuda for now by adding a new test_generic_stream_event.py in the test folder.

cc @guangyey here is the python side of the shared API we can use going forward. We might want to use xpu as a non-cuda test platform as well!


// If torch.Stream is not created from existing Stream, retrieve one from the
// stream pool. It requires other device backends override
// getStreamFromGlobalPool method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my question above. Why always use the pool here and not just getStream which can have a pool under the hood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getStream is used to track current stream used by this thread. getStreamFromGlobalPool returns pre-created stream in the pool. I currently feel it is actually better to use getStreamFromGlobalPool. Since streams are typically moving around and copied to another stream object, letting the pool manage the lifetime of those streams make the implementation much simpler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to separate the concepts of

  • Get the currently active stream for the currently active device/device_idx
  • Get a fresh new stream

I guess the proposal here to use getStream/getStreamFromGlobalPool is a close analogous to the cuda naming we have today.
tbh I think a clearer naming would be good here since we have the luxury to be able to rename these APIs. In particular getCurrentStream/getNewStream sound simpler to understand to me (the fact that the new stream comes from a pool or not is an implementation detail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote for adding a getNewStream API.

self->stream_id,
self->device_index,
static_cast<c10::DeviceType>(self->device_type))
.wait(event->event);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel should we release the gil here and below? Any edge case where these can be blocking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the cudaStreamWaitEvent won't block the current thread, the synchronization is done on the device if applicable. It should be fine not to release gil ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that the idea for not having it originally. But there are some edge cases (at least on cuda) when some queues in the driver are full, some calls can become blocking very very rarely. Leading to rare deadlocks.

For this PR, we should keep it as is unless @ngimel says otherwise.

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839


// NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays,cppcoreguidelines-avoid-non-const-global-variables)
static PyMethodDef THPStream_methods[] = {
{"query", THPStream_query, METH_NOARGS, nullptr},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD If torch.Stream has these methods, then _StreamBase should be useless, which is designed to capture stream's methods on dynamo, right? Similarly, _EventBase is also useless.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think the end result is

  • (almost) all users use torch.Stream directly
  • We can remove torch.*.Stream in most cases
  • For the ones left, we can migrate them to inherit from torch.Stream and not _StreamBase anymore
  • We should only have them if there are device-specific feature that we do not want in the generic impl explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guangyey @albanD I attached another PR to test replacing _StreamBase and _EventBase with torch.Stream/Event. #123707

@guangyey
Copy link
Collaborator

guangyey commented Apr 9, 2024

Thanks for the great PR description! This looks quite good. I think you can test it with cuda for now by adding a new test_generic_stream_event.py in the test folder.

cc @guangyey here is the python side of the shared API we can use going forward. We might want to use xpu as a non-cuda test platform as well!

Thanks, I will follow this PR to help to finish the test on XPU backend.

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

… c10::Stream/Event"

This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351839

@egienvalue egienvalue requested a review from albanD April 11, 2024 16:12
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 8, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event`
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event`
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on XPU backend.

# Additional Context
new method/attribute on `torch.Event` for xpu
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on xpu backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

Pull Request resolved: #125751
Approved by: https://github.com/jgong5, https://github.com/albanD
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
…ackend"


# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize


[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request May 10, 2024
# Motivation
According to [#123611](#123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

Pull Request resolved: #125757
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/EikanWang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants