[Intel GPU] Add device guard for XPU structured operator in torchgen #138802

toyxu · 2024-10-24T08:39:41Z

This PR is a supplement to #133980. The previous PR fulfill the basic functionality of XPU device guard, while we found it fails to address structured operators.

With current PR, the code snippet in RegisterXPU.cpp is as follows, where we can see the device guard is successfully generated.

struct structured_exp_out_functional final : public at::native::structured_exp_out {
    void set_output_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        auto current_device = guard_.current_device();
        if (C10_UNLIKELY(current_device.has_value())) {
          TORCH_INTERNAL_ASSERT(*current_device == options.device(),
            "structured kernels don't support multi-device outputs");
        } else {
          guard_.reset_device(options.device());
        }
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    void set_output_raw_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        auto current_device = guard_.current_device();
        if (C10_UNLIKELY(current_device.has_value())) {
          TORCH_INTERNAL_ASSERT(*current_device == options.device(),
            "structured kernels don't support multi-device outputs");
        } else {
          guard_.reset_device(options.device());
        }
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    const Tensor& maybe_get_output(int64_t output_idx) override {
      return outputs_[output_idx];
    }
    std::array<Tensor, 1> outputs_;
    c10::OptionalDeviceGuard guard_;
};

However, without current change, the generated code is

struct structured_exp_out_functional final : public at::native::structured_exp_out {
    void set_output_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    void set_output_raw_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    const Tensor& maybe_get_output(int64_t output_idx) override {
      return outputs_[output_idx];
    }
    std::array<Tensor, 1> outputs_;
};

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-10-24T08:39:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138802

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2d25b97 with merge base c3087ac ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2024-10-24T08:42:07Z

Please seek CI approval before scheduling CIFlow labels

pytorch-bot · 2024-10-24T08:42:15Z

Please seek CI approval before scheduling CIFlow labels

pytorch-bot · 2024-10-24T08:42:15Z

Please seek CI approval before scheduling CIFlow labels

EikanWang

Please refine the code.

EikanWang · 2024-10-24T13:15:00Z

torchgen/dest/register_dispatch_key.py

            guard_field = "c10::OptionalDeviceGuard guard_;"
+        elif self.backend_index.dispatch_key == DispatchKey.XPU:
+            guard_field = "c10::OptionalDeviceGuard guard_;"
        else:


Pls. refine the code. Because CUDA, MPS, and XPU share the same guard_filed code.

It's not the same. CUDA uses OptionalCUDAGuard, and MPS will move to optionalMPSGuard.

EikanWang · 2024-10-24T13:16:41Z

@xytintel , pls. elaborate on the details in the PR description to explain why.

EikanWang · 2024-10-30T04:15:30Z

Pls. check the ci signal.

fengyuan14 · 2024-11-01T06:10:02Z

Please rebase.

test/inductor/test_torchinductor_opinfo.py

ezyang · 2024-11-13T03:26:51Z

@pytorchbot merge

pytorchmergebot · 2024-11-13T03:28:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#138802) This PR is a supplement to pytorch#133980. The previous PR fulfill the basic functionality of XPU device guard, while we found it fails to address structured operators. With current PR, the code snippet in RegisterXPU.cpp is as follows, where we can see the device guard is successfully generated. ```c++ struct structured_exp_out_functional final : public at::native::structured_exp_out { void set_output_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { auto current_device = guard_.current_device(); if (C10_UNLIKELY(current_device.has_value())) { TORCH_INTERNAL_ASSERT(*current_device == options.device(), "structured kernels don't support multi-device outputs"); } else { guard_.reset_device(options.device()); } outputs_[output_idx] = create_out(sizes, strides, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names); } void set_output_raw_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { auto current_device = guard_.current_device(); if (C10_UNLIKELY(current_device.has_value())) { TORCH_INTERNAL_ASSERT(*current_device == options.device(), "structured kernels don't support multi-device outputs"); } else { guard_.reset_device(options.device()); } outputs_[output_idx] = create_out(sizes, strides, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names); } const Tensor& maybe_get_output(int64_t output_idx) override { return outputs_[output_idx]; } std::array<Tensor, 1> outputs_; c10::OptionalDeviceGuard guard_; }; ``` However, without current change, the generated code is ```c++ struct structured_exp_out_functional final : public at::native::structured_exp_out { void set_output_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { outputs_[output_idx] = create_out(sizes, strides, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names); } void set_output_raw_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { outputs_[output_idx] = create_out(sizes, strides, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names); } const Tensor& maybe_get_output(int64_t output_idx) override { return outputs_[output_idx]; } std::array<Tensor, 1> outputs_; }; ``` Pull Request resolved: pytorch#138802 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/ezyang

toyxu added 2 commits October 24, 2024 16:24

Add device guard support for XPU structured ops

3107a0e

Add XPU device guard in gen_class_set_output_body

34169f5

pytorch-bot bot added the topic: not user facing topic category label Oct 24, 2024

toyxu marked this pull request as draft October 24, 2024 08:40

ZhiweiYan-96 added the ciflow/xpu Run XPU CI tasks label Oct 24, 2024

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Oct 24, 2024

ZhiweiYan-96 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Oct 24, 2024

pytorch-bot bot removed ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels Oct 24, 2024

ZhiweiYan-96 requested a review from EikanWang October 24, 2024 08:42

pytorchbot added the open source label Oct 24, 2024

EikanWang requested changes Oct 24, 2024

View reviewed changes

EikanWang added the ciflow/xpu Run XPU CI tasks label Oct 24, 2024

guangyey added the release notes: xpu release notes category label Oct 28, 2024

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

0d5d670

toyxu marked this pull request as ready for review October 30, 2024 02:57

toyxu requested a review from EikanWang October 30, 2024 02:57

EikanWang approved these changes Oct 30, 2024

View reviewed changes

toyxu added 2 commits October 31, 2024 15:21

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

58bcc74

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

e447533

toyxu added 2 commits November 5, 2024 09:30

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

8c25ee3

Update xpu.txt

9b3c441

pytorch-bot bot added the module: inductor label Nov 6, 2024

toyxu added 3 commits November 6, 2024 16:36

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

474fcb3

Update test_torchinductor_opinfo.py

f77fc96

fix typo

3b7e57d

toyxu mentioned this pull request Nov 6, 2024

[Intel GPU] Allow XPU device in cdist and pdist operators #138441

Closed

toyxu added 2 commits November 7, 2024 11:56

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

2bb15c5

Revert xpu pin

fd946df

guangyey reviewed Nov 7, 2024

View reviewed changes

test/inductor/test_torchinductor_opinfo.py Outdated Show resolved Hide resolved

toyxu added 2 commits November 10, 2024 15:44

revert

b6dfb41

Merge branch 'pytorch:main' into xyt/xpu_structured_op_guard

2d25b97

guangyey approved these changes Nov 12, 2024

View reviewed changes

guangyey requested review from atalman, ezyang and malfet November 12, 2024 06:11

guangyey added this to the 2.6.0 milestone Nov 12, 2024

EikanWang approved these changes Nov 12, 2024

View reviewed changes

ezyang approved these changes Nov 13, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 13, 2024

pytorchmergebot added the merging label Nov 13, 2024

pytorchmergebot added the Merged label Nov 13, 2024

pytorchmergebot closed this in 79fb741 Nov 13, 2024

pytorchmergebot removed the merging label Nov 13, 2024

atalman mentioned this pull request Jan 13, 2025

Release 2.6.0 validations checklist and cherry-picks #144503

Closed

73 tasks

[Intel GPU] Add device guard for XPU structured operator in torchgen #138802

[Intel GPU] Add device guard for XPU structured operator in torchgen #138802

Uh oh!

Conversation

toyxu commented Oct 24, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138802

✅ No Failures

Uh oh!

pytorch-bot bot commented Oct 24, 2024

Uh oh!

pytorch-bot bot commented Oct 24, 2024

Uh oh!

pytorch-bot bot commented Oct 24, 2024

Uh oh!

EikanWang left a comment

Choose a reason for hiding this comment

Uh oh!

EikanWang Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

toyxu Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

EikanWang commented Oct 24, 2024

Uh oh!

EikanWang commented Oct 30, 2024

Uh oh!

fengyuan14 commented Nov 1, 2024

Uh oh!

Uh oh!

ezyang commented Nov 13, 2024

Uh oh!

pytorchmergebot commented Nov 13, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

toyxu commented Oct 24, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 24, 2024 •

edited

Loading