Skip to content

Conversation

@copybara-service
Copy link

PR #13603: NVTX: name threads, CUDA devices and CUDA streams

Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
Screenshot 2024-06-10 at 14 52 37

Stream names:
Screenshot 2024-06-10 at 14 53 25

Thread names:
Screenshot 2024-06-10 at 14 54 04

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
5b3121c58db8aa1b6529f0aeb8573be8bf2cde80 by Olli Lupton [email protected]:

NVTX: name threads, CUDA devices and CUDA streams

--
d973674de6218fcee88473d85bb43ba345652fdf by Olli Lupton [email protected]:

Address review comments

--
918cf3e7b87150e9d666b218bbd9aca0cae606a4 by Olli Lupton [email protected]:

Alternative for @jbaiocchi

--
1d1978437e64c0dac97e97ea4320a6dcb3945296 by Olli Lupton [email protected]:

Address more review comments

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 1d1978437e64c0dac97e97ea4320a6dcb3945296

@copybara-service copybara-service bot force-pushed the exported_pr_644704876 branch 3 times, most recently from 9893f8c to 4128657 Compare June 20, 2024 05:29
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
5b3121c58db8aa1b6529f0aeb8573be8bf2cde80 by Olli Lupton <[email protected]>:

NVTX: name threads, CUDA devices and CUDA streams

--
d973674de6218fcee88473d85bb43ba345652fdf by Olli Lupton <[email protected]>:

Address review comments

--
918cf3e7b87150e9d666b218bbd9aca0cae606a4 by Olli Lupton <[email protected]>:

Alternative for @jbaiocchi

--
1d1978437e64c0dac97e97ea4320a6dcb3945296 by Olli Lupton <[email protected]>:

Address more review comments

Merging this change closes #13603

PiperOrigin-RevId: 644901234
@copybara-service copybara-service bot force-pushed the exported_pr_644704876 branch from 4128657 to 9b12cd1 Compare June 20, 2024 06:00
@copybara-service copybara-service bot closed this Jun 20, 2024
@copybara-service copybara-service bot merged commit 9b12cd1 into master Jun 20, 2024
@copybara-service copybara-service bot deleted the exported_pr_644704876 branch June 20, 2024 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SVD on GPU is slower than SVD on CPU

1 participant