Refactor gpu trace to be device-agnostic #121794

guangyey · 2024-03-13T07:30:19Z

Stack from ghstack (oldest at bottom):

Motivation

Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend.

Solution

move _cuda_trace.py to _gpu_trace.py, which makes each device backend owns their callback, respectively.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

pytorch-bot · 2024-03-13T07:30:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121794

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4ee0b6f with merge base 3d3d4e1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey · 2024-03-13T07:55:10Z

torch/_utils.py

+P = ParamSpec("P")
+
+
+class CallbackRegistry(Generic[P]):


Move CallbackRegistry from torch.utils._cuda_trace.py to here, which can be shared by each backend.

guangyey · 2024-03-13T07:56:56Z

torch/csrc/PyInterpreter.cpp

+    pybind11::gil_scoped_acquire gil;                                       \
+    try {                                                                   \
+      std::string module_name =                                             \
+          "torch." + at::DeviceTypeName(device_type, true) + "._gpu_trace"; \


An assumption: the callback functions will be located in torch.xxx._gpu_trace.py for each backend.

You can use

pytorch/torch/_utils.py

Line 891 in be0bdf1

def _get_device_module(device_type: str):

here to simplify the logic

Do you mean like this,

try { \ py::module utils_mod = py::module::import("torch._utils"); \ py::object get_device_module = utils_mod.attr("_get_device_module"); \ py::object hook = get_device_module(DeviceTypeName(device_type, true)) \ .attr("_gpu_trace") \ .attr(func_name) \ .attr("fire_callbacks"); \ hook(__VA_ARGS__); \ } catch (const std::exception& e) { \ LOG(ERROR) << device_type \ << " trace hook execution failed: " << e.what(); \ } \

Option 2:

try { \ std::string module_name = "torch." + DeviceTypeName(device_type, true); \ py::module mod = py::module::import(module_name.c_str()); \ py::object hook = \ mod.attr("_gpu_trace").attr(func_name).attr("fire_callbacks"); \ hook(__VA_ARGS__); \ } catch (const std::exception& e) { \ LOG(ERROR) << device_type \ << " trace hook execution failed: " << e.what(); \ } \

option 2 is more concise, and Pybind can also raise a friendly error.
Which one do you prefer?

May I know if you have any suggestions?

[ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

guangyey · 2024-03-14T00:59:16Z

@albanD could you help review this PR?

aten/src/ATen/cuda/CUDAEvent.h

test/test_cuda_trace.py

albanD

Small change to simplify the c++ side, sounds good otherwise.

albanD · 2024-03-14T16:09:28Z

torch/csrc/PyInterpreter.cpp

+    pybind11::gil_scoped_acquire gil;                                       \
+    try {                                                                   \
+      std::string module_name =                                             \
+          "torch." + at::DeviceTypeName(device_type, true) + "._gpu_trace"; \


You can use

pytorch/torch/_utils.py

Line 891 in be0bdf1

def _get_device_module(device_type: str):

here to simplify the logic

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

guangyey · 2024-03-19T01:17:23Z

@pytorchbot merge

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

guangyey · 2024-03-25T01:09:11Z

@albanD May I know if you could help review this PR again?

guangyey · 2024-03-27T01:16:02Z

@albanD may I know if you have additional comments on this PR?

guangyey · 2024-03-28T16:07:22Z

@albanD Could you help review this PR again? I made a minor code change when I fixed rocm's tests.

albanD

Looks good!
Sorry for the delay, @huydhn feel free to ping me on these revert if I miss them!

guangyey · 2024-03-29T02:33:44Z

Looks good! Sorry for the delay, @huydhn feel free to ping me on these revert if I miss them!

Thank you very much~

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

guangyey · 2024-03-29T16:06:02Z

@pytorchbot merge

pytorchmergebot · 2024-03-29T16:09:27Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-29T16:09:38Z

Merge failed

Reason: 5 jobs have failed, first few of them are: rocm, trunk, linux-binary-libtorch-pre-cxx11, linux-binary-manywheel, linux-binary-libtorch-cxx11-abi

Details for Dev Infra team

Raised by workflow job

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

guangyey · 2024-03-30T13:02:33Z

@pytorchbot merge

pytorchmergebot · 2024-03-30T13:04:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation Support GPU trace on XPU backend. Add GPU trace to xpu runtime. It is beneficial to generalize the device caching allocator in the next step. Pull Request resolved: #121795 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/albanD ghstack dependencies: #121794

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: pytorch#121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui

# Motivation Support GPU trace on XPU backend. Add GPU trace to xpu runtime. It is beneficial to generalize the device caching allocator in the next step. Pull Request resolved: pytorch#121795 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/albanD ghstack dependencies: pytorch#121794

# Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: #121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui

# Motivation Support GPU trace on XPU backend. Add GPU trace to xpu runtime. It is beneficial to generalize the device caching allocator in the next step. Pull Request resolved: #121795 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/albanD ghstack dependencies: #121794

)" This reverts commit 148a8de. Reverted #120891 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert it to resolve a conflict in trunk #121794 (comment). Please help reland the change after ([comment](#120891 (comment)))

This reverts commit 91ead3e. Reverted #121795 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk https://hud.pytorch.org/pytorch/pytorch/commit/74deacbf31d032a2659dc1633dc3e5248921d466, please help take a look and reland the change ([comment](#121794 (comment)))

This reverts commit 74deacb. Reverted #121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk https://hud.pytorch.org/pytorch/pytorch/commit/74deacbf31d032a2659dc1633dc3e5248921d466, please help take a look and reland the change ([comment](#121794 (comment)))

guangyey requested review from albanD and soulitzer as code owners March 13, 2024 07:30

github-actions bot added module: dynamo ciflow/inductor labels Mar 13, 2024

guangyey mentioned this pull request Mar 13, 2024

Support gpu trace on XPU #121795

Closed

guangyey changed the title ~~refactor gpu trace to device-agnostic~~ [WIP]refactor gpu trace to device-agnostic Mar 13, 2024

guangyey marked this pull request as draft March 13, 2024 07:30

pytorchbot added the open source label Mar 13, 2024

guangyey commented Mar 13, 2024

View reviewed changes

guangyey added the intel This tag is for PR from Intel label Mar 13, 2024

guangyey requested review from EikanWang, gujinghui and jgong5 March 13, 2024 12:00

guangyey changed the title ~~[WIP]refactor gpu trace to device-agnostic~~ Refactor gpu trace to be device-agnostic Mar 13, 2024

guangyey marked this pull request as ready for review March 13, 2024 12:01

guangyey added the topic: new features topic category label Mar 13, 2024

guangyey added 2 commits March 13, 2024 15:19

refactor gpu trace to device-agnostic

8af72a0

[ghstack-poisoned]

Update on "[WIP]refactor gpu trace to device-agnostic"

0576d24

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

EikanWang reviewed Mar 14, 2024

View reviewed changes

aten/src/ATen/cuda/CUDAEvent.h Show resolved Hide resolved

test/test_cuda_trace.py Show resolved Hide resolved

jgong5 approved these changes Mar 14, 2024

View reviewed changes

albanD approved these changes Mar 14, 2024

View reviewed changes

EikanWang approved these changes Mar 15, 2024

View reviewed changes

gujinghui approved these changes Mar 15, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 19, 2024

pytorchmergebot added the merging label Mar 19, 2024

albanD approved these changes Mar 28, 2024

View reviewed changes

pytorchmergebot added the merging label Mar 29, 2024

pytorchmergebot removed the merging label Mar 29, 2024

guangyey added 3 commits March 30, 2024 00:06

pytorchmergebot added the merging label Mar 30, 2024

pytorchmergebot closed this in eb7adc3 Mar 30, 2024

pytorchmergebot removed the merging label Mar 30, 2024

github-actions bot deleted the gh/guangyey/16/head branch April 30, 2024 01:51

Refactor gpu trace to be device-agnostic #121794

Refactor gpu trace to be device-agnostic #121794

Uh oh!

Conversation

guangyey commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

pytorch-bot bot commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121794

✅ No Failures

Uh oh!

guangyey Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Mar 14, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Mar 18, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey commented Mar 14, 2024

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Mar 14, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey commented Mar 19, 2024

Uh oh!

guangyey commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangyey commented Mar 27, 2024

Uh oh!

guangyey commented Mar 28, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

guangyey commented Mar 29, 2024

Uh oh!

guangyey commented Mar 29, 2024

Uh oh!

pytorchmergebot commented Mar 29, 2024

Merge started

Uh oh!

pytorchmergebot commented Mar 29, 2024

Merge failed

Uh oh!

guangyey commented Mar 30, 2024

Uh oh!

pytorchmergebot commented Mar 30, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

guangyey commented Mar 13, 2024 •

edited

Loading

pytorch-bot bot commented Mar 13, 2024 •

edited

Loading

guangyey commented Mar 25, 2024 •

edited

Loading