[inductor] autotune benchmark support for cpu #125159

jgong5 · 2024-04-29T14:57:31Z

Stack from ghstack (oldest at bottom):

This PR adds the autotune Infrastructure for CPU. It generalizes and extends BenchmarkRequest with CPU support and C++ module loader. A do_bench_cpu util function is added for benchmarking functions on CPU with warmups and returns the median number from multiple trials.

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire

[ghstack-poisoned]

pytorch-bot · 2024-04-29T14:57:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125159

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7111f21 with merge base 5007312 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh) (similar failure)
test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

cc ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire [ghstack-poisoned]

jansel · 2024-05-07T17:25:21Z

third_party/ideep

Do submodule update in a different PR.

Thanks. A wrong "git add". Fixed.

shunting314 · 2024-05-07T18:04:17Z

torch/_inductor/ir.py

+        if is_cpu_device(args):
+            return do_bench_cpu(lambda: algo(*args, out=out))
+        else:
+            return do_bench(lambda: algo(*args, out=out))


Can we have this device dispatch logic inside do_bench?

Basically, rename current do_bench to do_bench_gpu. And create a new do_bench function that dispatches to do_bench_cpu/do_bench_gpu based on devices.

Thanks. I did the refactoring in a separate PR: #125736

torch/_inductor/autotune_process.py

This PR adds the autotune Infrastructure for CPU. It generalizes and extends `BenchmarkRequest` with CPU support and C++ module loader. A `do_bench_cpu` util function is added for benchmarking functions on CPU with warmups and returns the median number from multiple trials. cc gujinghui PenghuiCheng XiaobingSuper jianyuh mingfeima sanchitintel ashokei jingxu10 min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen snadampal ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire [ghstack-poisoned]

jgong5 · 2024-05-09T00:26:26Z

@pytorchbot merge

pytorchmergebot · 2024-05-09T00:28:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jerryzh168 · 2024-05-14T00:18:55Z

we do have api depend on do_bench btw: https://github.com/pytorch/ao/blob/main/torchao/quantization/autoquant.py#L200 is this not a public API?

jgong5 · 2024-05-14T00:29:41Z

we do have api depend on do_bench btw: https://github.com/pytorch/ao/blob/main/torchao/quantization/autoquant.py#L200 is this not a public API?

I thought it was only used by inductor internal. You may have to rename it to do_bench_gpu now since the usage is cuda specific?

Update

49f0aa9

[ghstack-poisoned]

jgong5 mentioned this pull request Apr 29, 2024

[inductor][cpp] move some common cpp utils to cpp_utils.py #125152

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Apr 29, 2024

jgong5 mentioned this pull request Apr 29, 2024

[inductor][cpp] GEMM template (infra and fp32) #124021

Closed

pytorch-bot bot added the oncall: pt2 label Apr 29, 2024

Update

14914b0

[ghstack-poisoned]

pytorchbot added the open source label Apr 29, 2024

Update

e963a06

[ghstack-poisoned]

jgong5 added the topic: not user facing topic category label Apr 29, 2024

Jiong Gong added 6 commits April 29, 2024 08:20

Update

15c2028

[ghstack-poisoned]

Update

ebffccc

[ghstack-poisoned]

Update

41aa036

[ghstack-poisoned]

Update

afe8666

[ghstack-poisoned]

Update

0aacf2b

[ghstack-poisoned]

pytorch-bot bot added ciflow/linux-aarch64 linux aarch64 CI workflow module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration labels May 7, 2024

jgong5 mentioned this pull request May 7, 2024

[RFC] Add Cpp Template for GEMM related ops via max-autotune for Inductor CPU #125683

Open

18 tasks

jgong5 requested review from jansel and shunting314 May 7, 2024 15:07

jansel requested changes May 7, 2024

View reviewed changes

shunting314 reviewed May 7, 2024

View reviewed changes

jgong5 mentioned this pull request May 8, 2024

[inductor] refactor: device dispatch inside do_bench #125736

Closed

Jiong Gong added 2 commits May 8, 2024 11:22

jansel approved these changes May 8, 2024

View reviewed changes

jgong5 added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2024

pytorchmergebot added the merging label May 9, 2024

pytorchmergebot closed this in 8def2e9 May 9, 2024

pytorchmergebot added Merged and removed merging labels May 9, 2024

github-actions bot deleted the gh/jgong5/42/head branch June 13, 2024 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] autotune benchmark support for cpu #125159

[inductor] autotune benchmark support for cpu #125159

Uh oh!

jgong5 commented Apr 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 29, 2024 •

edited

Loading

Uh oh!

jansel May 7, 2024

Uh oh!

jgong5 May 8, 2024

Uh oh!

shunting314 May 7, 2024

Uh oh!

jgong5 May 8, 2024

Uh oh!

Uh oh!

jgong5 commented May 9, 2024

Uh oh!

pytorchmergebot commented May 9, 2024

Uh oh!

jerryzh168 commented May 14, 2024

Uh oh!

jgong5 commented May 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[inductor] autotune benchmark support for cpu #125159

[inductor] autotune benchmark support for cpu #125159

Uh oh!

Conversation

jgong5 commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125159

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

jansel May 7, 2024

Choose a reason for hiding this comment

Uh oh!

jgong5 May 8, 2024

Choose a reason for hiding this comment

Uh oh!

shunting314 May 7, 2024

Choose a reason for hiding this comment

Uh oh!

jgong5 May 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgong5 commented May 9, 2024

Uh oh!

pytorchmergebot commented May 9, 2024

Merge started

Uh oh!

jerryzh168 commented May 14, 2024

Uh oh!

jgong5 commented May 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jgong5 commented Apr 29, 2024 •

edited

Loading

pytorch-bot bot commented Apr 29, 2024 •

edited

Loading