[inductor] show performance for each autotune config for a kernel by shunting314 · Pull Request #96248 · pytorch/pytorch

shunting314 · 2023-03-08T01:00:11Z

Stack from ghstack (oldest at bottom):

Be able to benchmark the perf for each config of each kernel.

To use it:

run the model with TORCHINDUCTOR_BENCHMARK_KERNEL enabled. e.g.:

TORCHINDUCTOR_BENCHMARK_KERNEL=1 python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only vgg16 --disable-cudagraphs --training

Get the path to the compiled module from log, e.g.

Compiled module path: /tmp/torchinductor_shunting/mj/cmjv5hyt3uq2v7beqkthcl4ul6fh2luwfzmd4tnrquworcmqz4i3.py

run the compiled module directly with the following options:

-k to benchmark each kernel
-c to benchmark each config for each kernel

Example command:

TORCHINDUCTOR_BENCHMARK_KERNEL=1 python /tmp/torchinductor_shunting/mj/cmjv5hyt3uq2v7beqkthcl4ul6fh2luwfzmd4tnrquworcmqz4i3.py -kc

Sample result:

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

[ghstack-poisoned]

pytorch-bot · 2023-03-08T01:00:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96248

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 9ce6ad5:

NEW FAILURES - The following jobs have failed:

linux-bionic-cuda11.7-py3.10-gcc7-sm86 / test (default, 1, 4, linux.g5.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2023-03-08T01:29:30Z

Do we need a step to first output compiled module? Can we output results of all autotuning configs as we are autotuning them for the first time, running the model?

shunting314 · 2023-03-08T01:45:37Z

Do we need a step to first output compiled module? Can we output results of all autotuning configs as we are autotuning them for the first time, running the model?

yea, currently we split the following steps

generate compiled modules
benchmark each kernel in the compiled modules

This way, we can do step 1 once and do step 2 multiple times as we tune hueristics.

If we want, we can also make step 2 being done on the fly while we do step 1 as you mentioned. There is one tricky part here though. We have cache for autotuning result. We would want to ignore the cache if we want to show perf for each config. But each model is usually being run multiple times in our scripts (for warm up or for more stable perf number), we need

either take care to only print the autotuning results for the first run.
or even better, we only disable autotuning cache for the first run and future run will not even do autotuning because of cache hit

Do we want to go this route?

Chillee · 2023-03-08T02:52:08Z

torch/_inductor/utils.py

-
-        if ms > 0.012 and gb_per_s < 650:
-            print(colorama.Fore.RED + info_str + colorama.Fore.RESET)
+        def get_info_str(ms, prefix=""):


btw I'm changing this code a bit to put the kernel name at the end: #96170

torch/_inductor/config.py

… kernel" Be able to benchmark the perf for each config of each kernel. To use it: 1. run the model with `TORCHINDUCTOR_BENCHMARK_KERNEL` enabled. e.g.: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only vgg16 --disable-cudagraphs --training ``` Get the path to the compiled module from log, e.g. ``` Compiled module path: /tmp/torchinductor_shunting/mj/cmjv5hyt3uq2v7beqkthcl4ul6fh2luwfzmd4tnrquworcmqz4i3.py ``` 2. run the compiled module directly with the following options: - `-k` to benchmark each kernel - `-c` to benchmark each config for each kernel Example command: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 python /tmp/torchinductor_shunting/mj/cmjv5hyt3uq2v7beqkthcl4ul6fh2luwfzmd4tnrquworcmqz4i3.py -kc ``` Sample result: <img width="829" alt="Screenshot 2023-03-06 at 6 05 23 PM" src="https://user-images.githubusercontent.com/52589240/223300934-59a4634b-dfd1-46f5-b964-dc0074535236.png"> cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

ghstack-source-id: e75fc8c Pull Request resolved: #96248

ngimel · 2023-03-08T19:08:56Z

Yeah I think two-step process is fine for now, but for the future the disabling cache for the first run that you described would add nice convenience.
The reason I want regular runs to be able to print stats is because sometimes the produced ouput_code is non-runnable or gives skewed performance because of the random inputs, and it's nice to be able to run on "real" inputs.

shunting314 · 2023-03-08T19:15:48Z

Yeah I think two-step process is fine for now, but for the future the disabling cache for the first run that you described would add nice convenience. The reason I want regular runs to be able to print stats is because sometimes the produced ouput_code is non-runnable or gives skewed performance because of the random inputs, and it's nice to be able to run on "real" inputs.

Make sense. Also you pointed out earlier that there may be some alias between inputs that the randomly generated inputs may not be able to capture. We can improve these for sure if they results into problems.

shunting314 · 2023-03-09T00:45:49Z

@pytorch merge -f "the test_tensorboard failure is unrelated"

shunting314 · 2023-03-09T00:46:28Z

@pytorchbot merge -f "the test_tensorboard failure is unrelated"

…rnel (#96248)" This reverts commit bc8f9f2.

shunting314 · 2023-03-09T22:22:14Z

@pytorchbot merge

pytorchmergebot · 2023-03-09T22:23:58Z

Can't merge closed PR #96248

…r a kernel (#96458) Pull Request resolved: #96458 Approved by: https://github.com/ngimel

[inductor] show performance for each autotune config for a kernel

ade001e

[ghstack-poisoned]

shunting314 mentioned this pull request Mar 8, 2023

[inductor] show more kernel specific metrics in the benchmark result #96249

Closed

github-actions bot added ciflow/inductor module: inductor labels Mar 8, 2023

shunting314 mentioned this pull request Mar 8, 2023

[inductor] show performance for each autotune config for a kernel #96162

Closed

shunting314 added the topic: not user facing topic category label Mar 8, 2023

shunting314 requested review from Chillee and ngimel March 8, 2023 01:04

Chillee approved these changes Mar 8, 2023

View reviewed changes

shunting314 added a commit that referenced this pull request Mar 8, 2023

[inductor] show performance for each autotune config for a kernel

fa40cb0

ghstack-source-id: e75fc8c Pull Request resolved: #96248

ngimel approved these changes Mar 8, 2023

View reviewed changes

shunting314 merged commit bc8f9f2 into gh/shunting314/23/base Mar 9, 2023

shunting314 deleted the gh/shunting314/23/head branch March 9, 2023 19:12

shunting314 restored the gh/shunting314/23/head branch March 9, 2023 19:15

shunting314 added a commit that referenced this pull request Mar 9, 2023

Revert "[inductor] show performance for each autotune config for a ke…

c917441

…rnel (#96248)" This reverts commit bc8f9f2.

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] show performance for each autotune config for a kernel#96248

[inductor] show performance for each autotune config for a kernel#96248
shunting314 merged 2 commits intogh/shunting314/23/basefrom
gh/shunting314/23/head

shunting314 commented Mar 8, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 8, 2023

Uh oh!

Chillee Mar 8, 2023

Uh oh!

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

pytorchmergebot commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shunting314 commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96248

❌ 1 Failures

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 8, 2023

Uh oh!

Chillee Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 8, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

shunting314 commented Mar 9, 2023

Uh oh!

pytorchmergebot commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shunting314 commented Mar 8, 2023 •

edited

Loading

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading