[inductor] show more kernel specific metrics in the benchmark result#96249
[inductor] show more kernel specific metrics in the benchmark result#96249shunting314 wants to merge 4 commits intogh/shunting314/24/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96249
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 FailuresAs of commit 776b771: NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base fe05266:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a labelIf your changes are user facing and intended to be a part of release notes, please use a label starting with If not, please add the For more information, see |
…ark result" Show the following kernel specific metrics in the kernel benchmark: - shared memory used - number of registers used - number of spills. Depends on triton-lang/triton#1296 cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
|
Hmm... I think we might need a way to config what info is shown for each kernel 🤔 I'm not sure we always want this info? |
yea, I'll add an option to control those kernel details in the output |
…ark result" Show the following kernel specific metrics in the kernel benchmark: - shared memory used - number of registers used - number of spills. Depends on triton-lang/triton#1296 cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
|
@shunting314 I don't think we need a flag necessarily. I think we should just turn it on when benchmarking kernels and not with the TORCHINDUCTOR_PROFILE path. |
Ah, I misunderstood that then. So currently we have 2 places to print kernel bandwidth
So we won't see the kernel details in the output for TORCHINDUCTOR_PROFILE. One improvement is we can create a single API to print bandwidth information and both cases can call that. The API will make printing kernel detail optional. Let me know if you want me to do that. |
Yeah that would be nice. Also, I'm wondering if you'll have some issues landing this since you need a corresponding Triton bump. Might need to guard on the attributes existing. |
Good question lol. I feel TORCHINDUCTOR_BENCHMARK_KERNEL is already a guard. People can use the kernel benchmarks only when the env-var is enabled. In case they enable the env-var but use a stale triton version, we indeed will show some attribute not found errors. I can improve this a bit: print a warning to ask user to update triton version |
|
It would be good to add a test for this stuff though, just to make sure they don't get broken by a random change. |
yea, will add one |
…ark result" Show the following kernel specific metrics in the kernel benchmark: - shared memory used - number of registers used - number of spills. Depends on triton-lang/triton#1296 cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
|
Duo to some merge issue (partially because of the diff train sev) I have to reland this as #96461 . Sorry for the trouble |
…s in the benchmark result" cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
…nchmark result (#96461) Pull Request resolved: #96461 Approved by: https://github.com/ngimel

Stack from ghstack (oldest at bottom):
Show the following kernel specific metrics in the kernel benchmark:
Depends on triton-lang/triton#1296
cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire