[inductor] [cpp] improve cache blocking for is_dynamic_M #131306

chunyuan-w · 2024-07-22T06:41:46Z

Stack from ghstack (oldest at bottom):

Performance

Models with >= 3% performance speedup are listed below:

AMP single-thread dynamic shape (measured on CPU with AMX support)

No regressions

Model Family	Model Name	Speedup
torchbench	soft_actor_critic	3%

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-07-22T06:41:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131306

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c4d84b1 with merge base 41e6534 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper_abi_compatible, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
'test/inductor/test_cuda_cpp_wrapper.py::DynamicShapesCudaWrapperCudaTests::test_cat_slice_cat_cuda_dynamic_shapes_cuda_wrapper'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/_inductor/codegen/cpp_prefix.h

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 1840729 Pull Request resolved: #131306

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 13a93cb Pull Request resolved: #131306

ghstack-source-id: 13a93cb Pull Request resolved: pytorch#131306

chunyuan-w · 2024-07-29T01:26:18Z

@pytorchbot rebase

pytorchmergebot · 2024-07-29T01:27:45Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-07-29T01:27:50Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict gh/chunyuan-w/21/orig returned non-zero exit code 1

Rebasing (1/1)
Auto-merging torch/_inductor/codegen/cpp_gemm_template.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/cpp_gemm_template.py
Auto-merging torch/_inductor/codegen/cpp_prefix.h
error: could not apply 3d05a8b6b7... [inductor] [cpp] improve cache blocking for is_dynamic_M
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 3d05a8b6b7... [inductor] [cpp] improve cache blocking for is_dynamic_M

Raised by https://github.com/pytorch/pytorch/actions/runs/10136513983

ghstack-source-id: 7ad45d9 Pull Request resolved: #131306

[ghstack-poisoned]

ghstack-source-id: e958952 Pull Request resolved: #131306

[ghstack-poisoned]

jgong5 · 2024-09-06T13:13:56Z

@pytorchbot merge

pytorchmergebot · 2024-09-06T13:15:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ghstack-source-id: f8c94ed Pull Request resolved: pytorch/pytorch#131306

) ## Performance Models with >= 3% performance speedup are listed below: ### AMP single-thread dynamic shape (measured on CPU with AMX support) No regressions | Model Family | Model Name | Speedup | |--------------|------------|---------| torchbench | soft_actor_critic| 3% Pull Request resolved: pytorch#131306 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel ghstack dependencies: pytorch#135275 Co-authored-by: Jiong Gong <[email protected]>

[inductor] [cpp] improve cache blocking for is_dynamic_M

88791a0

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 22, 2024

chunyuan-w marked this pull request as draft July 22, 2024 06:42

pytorchbot added the open source label Jul 22, 2024

jgong5 requested changes Jul 22, 2024

View reviewed changes

torch/_inductor/codegen/cpp_prefix.h Outdated Show resolved Hide resolved

torch/_inductor/codegen/cpp_prefix.h Outdated Show resolved Hide resolved

torch/_inductor/codegen/cpp_prefix.h Outdated Show resolved Hide resolved

Update on "[inductor] [cpp] improve cache blocking for is_dynamic_M"

5c6f819

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Jul 22, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

4570912

ghstack-source-id: 1840729 Pull Request resolved: #131306

jgong5 approved these changes Jul 23, 2024

View reviewed changes

chunyuan-w added the topic: not user facing topic category label Jul 23, 2024

Update on "[inductor] [cpp] improve cache blocking for is_dynamic_M"

fa62fd4

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Jul 23, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

3d05a8b

ghstack-source-id: 13a93cb Pull Request resolved: #131306

chunyuan-w added a commit to chunyuan-w/pytorch that referenced this pull request Jul 24, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

3064fe4

ghstack-source-id: 13a93cb Pull Request resolved: pytorch#131306

chunyuan-w added a commit that referenced this pull request Jul 29, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

02ca5e6

ghstack-source-id: 7ad45d9 Pull Request resolved: #131306

chunyuan-w and others added 3 commits July 29, 2024 10:00

Update

c86f039

[ghstack-poisoned]

Update

7706936

[ghstack-poisoned]

Update

b22c752

[ghstack-poisoned]

jgong5 pushed a commit that referenced this pull request Aug 14, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

2cf34dd

ghstack-source-id: e958952 Pull Request resolved: #131306

leslie-fang-intel approved these changes Aug 14, 2024

View reviewed changes

jgong5 mentioned this pull request Aug 14, 2024

[inductor][cpp][gemm] enable dynamic M for k-slicing #133447

Closed

jgong5 marked this pull request as ready for review August 14, 2024 14:45

jgong5 mentioned this pull request Aug 15, 2024

[inductor][cpp][gemm] cache blocking config for dynamic shapes #133538

Closed

Update

f37f4df

[ghstack-poisoned]

jgong5 mentioned this pull request Aug 17, 2024

[inductor][cpp][gemm] improve cache blocking for small K and N #133755

Closed

Jiong Gong and others added 8 commits August 17, 2024 01:22

Update

4a56d11

[ghstack-poisoned]

Update

1a606ce

[ghstack-poisoned]

Update

2e2873a

[ghstack-poisoned]

Update

f20b36b

[ghstack-poisoned]

Update

6140eb2

[ghstack-poisoned]

Update

2e7f1f1

[ghstack-poisoned]

Update

77a8d69

[ghstack-poisoned]

Update

6053a0f

[ghstack-poisoned]

jgong5 mentioned this pull request Sep 5, 2024

[inductor][cpp][gemm] fix autotune runtime error from linear_binary fusion #135275

Closed

Update

9cea373

[ghstack-poisoned]

jgong5 mentioned this pull request Sep 5, 2024

[inductor][cpp][gemm] reduce memory alloc overhead by allocating local acc once per thread #135277

Closed

Jiong Gong and others added 4 commits September 5, 2024 15:45

Update

f591a60

[ghstack-poisoned]

Update

ea784be

[ghstack-poisoned]

Update

fe7b5a0

[ghstack-poisoned]

Update

c4d84b1

[ghstack-poisoned]

chunyuan-w added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 6, 2024

pytorchmergebot added the merging label Sep 6, 2024

pytorchmergebot added the Merged label Sep 6, 2024

pytorchmergebot closed this in 13bae39 Sep 6, 2024

pytorchmergebot removed the merging label Sep 6, 2024

enter-ctrl9 pushed a commit to enter-ctrl9/pytorch11 that referenced this pull request Sep 15, 2024

[inductor] [cpp] improve cache blocking for is_dynamic_M

8d1e0f5

ghstack-source-id: f8c94ed Pull Request resolved: pytorch/pytorch#131306

github-actions bot deleted the gh/chunyuan-w/21/head branch October 7, 2024 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] [cpp] improve cache blocking for is_dynamic_M #131306

[inductor] [cpp] improve cache blocking for is_dynamic_M #131306

Uh oh!

chunyuan-w commented Jul 22, 2024 •

edited by jgong5

Loading

Uh oh!

pytorch-bot bot commented Jul 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunyuan-w commented Jul 29, 2024

Uh oh!

pytorchmergebot commented Jul 29, 2024

Uh oh!

pytorchmergebot commented Jul 29, 2024

Uh oh!

jgong5 commented Sep 6, 2024

Uh oh!

pytorchmergebot commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[inductor] [cpp] improve cache blocking for is_dynamic_M #131306

[inductor] [cpp] improve cache blocking for is_dynamic_M #131306

Uh oh!

Conversation

chunyuan-w commented Jul 22, 2024 • edited by jgong5 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

AMP single-thread dynamic shape (measured on CPU with AMX support)

Uh oh!

pytorch-bot bot commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131306

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunyuan-w commented Jul 29, 2024

Uh oh!

pytorchmergebot commented Jul 29, 2024

Uh oh!

pytorchmergebot commented Jul 29, 2024

Uh oh!

jgong5 commented Sep 6, 2024

Uh oh!

pytorchmergebot commented Sep 6, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chunyuan-w commented Jul 22, 2024 •

edited by jgong5

Loading

pytorch-bot bot commented Jul 22, 2024 •

edited

Loading