Ensure that BlockMask length must always exactly match the sequence length in flex_attention #141625

Chillee · 2024-11-26T23:18:57Z

Stack from ghstack (oldest at bottom):

-> Ensure that BlockMask length must always exactly match the sequence length in flex_attention #141625
Added option to control number of kernel options displayed #138788

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

…ength in flex_attention [ghstack-poisoned]

pytorch-bot · 2024-11-26T23:19:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141625

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit c1f79fe with merge base 78491d6 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
detectron2_fcos_r_50_fpn

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… sequence length in flex_attention" Fixes #141435 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

…ength in flex_attention ghstack-source-id: 1f4d30f Pull Request resolved: #141625

torch/_higher_order_ops/flex_attention.py

torch/nn/attention/flex_attention.py

drisspg · 2024-11-27T01:54:28Z

test/inductor/test_flex_attention.py

+            flex_attention_call(*create_inputs(2048), block_mask=block_mask)
+
+        block_mask = create_block_mask(mask_mod, None, None, 1023, 1023)
+        with self.assertRaisesRegex(ValueError, "block_mask was created for"):


nit: stricter assert message check

How do you want to make it stricter? I mainly just wanted to check that it's throwing the right error.

In one case the inputs are smaller than the block mask size, in the other they are bigger and we have two different error messages, just meant ensure the correct message is being shown where the asserts are the same here

torch/_inductor/kernel/flex_attention.py

… sequence length in flex_attention" Fixes #141435 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

…ength in flex_attention ghstack-source-id: bba1692 Pull Request resolved: #141625

drisspg

🍻

Probs wait for diff train unblocking before landing

… sequence length in flex_attention" Fixes #141435 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

albanD · 2024-12-02T14:06:49Z

@pytorchbot revert -m "Broken main" -c nosignal

See https://hud.pytorch.org/pytorch/pytorch/commit/795f28ac552eb61d02ea02fd64637ba814133bd8 for failures.

pytorchmergebot · 2024-12-02T14:10:26Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-12-02T14:10:41Z

@Chillee your PR has been successfully reverted.

…quence length in flex_attention (#141625)" This reverts commit 795f28a. Reverted #141625 on behalf of https://github.com/albanD due to Broken main ([comment](#141625 (comment)))

… sequence length in flex_attention" Fixes #141435 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Chillee · 2024-12-02T23:07:03Z

@pytorchbot merge

…ength in flex_attention ghstack-source-id: 68a2185 Pull Request resolved: #141625

pytorchmergebot · 2024-12-02T23:09:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Chillee · 2024-12-03T03:01:33Z

@pytorchbot merge -i

pytorchmergebot · 2024-12-03T03:01:51Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-12-03T03:04:14Z

Merge started

Your change will be merged while ignoring the following 2 checks: inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable), inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ength in flex_attention (pytorch#141625) Fixes pytorch#141435 Pull Request resolved: pytorch#141625 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#138788

…quence length in flex_attention (pytorch#141625)" This reverts commit 795f28a. Reverted pytorch#141625 on behalf of https://github.com/albanD due to Broken main ([comment](pytorch#141625 (comment)))

…ength in flex_attention (pytorch#141625) Fixes pytorch#141435 Pull Request resolved: pytorch#141625 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#138788

Ensure that BlockMask length must always exactly match the sequence l…

da16b5e

…ength in flex_attention [ghstack-poisoned]

Chillee requested review from albanD, jbschlosser and mikaylagawarecki as code owners November 26, 2024 23:18

Chillee mentioned this pull request Nov 26, 2024

Added option to control number of kernel options displayed #138788

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 26, 2024

github-actions bot requested a review from ezyang November 26, 2024 23:19

Chillee requested a review from zou3519 as a code owner November 27, 2024 01:09

Chillee added a commit that referenced this pull request Nov 27, 2024

Ensure that BlockMask length must always exactly match the sequence l…

6ca77d3

…ength in flex_attention ghstack-source-id: 1f4d30f Pull Request resolved: #141625