Fix a bunch of stride issues with FlexAttention #130160

Chillee · 2024-07-05T17:40:11Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…rds and accept arbitrary strides for gradOut [ghstack-poisoned]

yanboliang · 2024-07-05T17:44:13Z

torch/_inductor/kernel/flex_attention.py

    # Sub notation for this kernel:
    #
    # Q: Query, K: Key, V: Value
-    # OUT: Forward output, LSE: logsumexp (logsumexp is always stored in fp32 regardless of the input dtype)


pytorch-bot · 2024-07-05T17:52:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130160

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a73a149 with merge base a33ee73 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral) (gh) (similar failure)
test_testing.py::TestImports::test_not_import_sympy

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral) (gh) (trunk failure)
functorch\test_eager_transforms.py::TestComposabilityCPU::test_no_warning_on_import_functorch_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… from forwards and accept arbitrary strides for gradOut" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Chillee · 2024-07-06T03:49:52Z

@pytorchbot merge

pytorchmergebot · 2024-07-06T03:52:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Compiling the `create_block_mask` function allows us to "materialize" extremely large masks. This would have been a 1 *trillion* element tensor if fully materialized. ``` print(do_bench(lambda: create_block_mask(causal_mask, 1, 1, 2**20, 2**20, _compiled=True))) ``` Pull Request resolved: #130106 Approved by: https://github.com/yanboliang ghstack dependencies: #130160

Pull Request resolved: #130224 Approved by: https://github.com/yanboliang ghstack dependencies: #130160, #130106

Pull Request resolved: #130227 Approved by: https://github.com/yanboliang ghstack dependencies: #130160, #130106, #130224

…tion numerics to be as accurate as FA2) (#130250) Pull Request resolved: #130250 Approved by: https://github.com/drisspg ghstack dependencies: #130160, #130106, #130224, #130227

…tion numerics to be as accurate as FA2) (pytorch#130250) Pull Request resolved: pytorch#130250 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#130160, pytorch#130106, pytorch#130224, pytorch#130227

Modified FlexAttention to always return transposed strides from forwa…

0d8bca3

…rds and accept arbitrary strides for gradOut [ghstack-poisoned]

github-actions bot requested a review from ezyang July 5, 2024 17:40

yanboliang approved these changes Jul 5, 2024

View reviewed changes

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 5, 2024

This was referenced Jul 5, 2024

Added compile option to create_block_mask #130106

Closed

Made FlexAttention and create_block_mask public #130107

Closed

Chillee changed the title ~~Modified FlexAttention to always return transposed strides from forwards and accept arbitrary strides for gradOut~~ Fix a bunch of stride issues with FlexAttention Jul 5, 2024

Update on "Fix a bunch of stride issues with FlexAttention"

a73a149

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Chillee added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Jul 5, 2024

pytorchmergebot added the merging label Jul 6, 2024

pytorchmergebot added the Merged label Jul 6, 2024

pytorchmergebot closed this in bf60963 Jul 6, 2024

pytorchmergebot removed the merging label Jul 6, 2024

This was referenced Jul 7, 2024

Fix indexing twice with score_mod #130224

Closed

Add block mask utility support for batches and heads > 1 #130227

Closed

Add scale kwarg to FlexAttention (and some changes that get FlexAttention numerics to be as accurate as FA2) #130250

Closed

Chillee mentioned this pull request Jul 9, 2024

Try removing indexing limitation from templates #130308

Closed

github-actions bot deleted the gh/chillee/316/head branch August 6, 2024 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix a bunch of stride issues with FlexAttention #130160

Fix a bunch of stride issues with FlexAttention #130160

Chillee commented Jul 5, 2024 •

edited

Loading

Uh oh!

yanboliang Jul 5, 2024

Uh oh!

pytorch-bot bot commented Jul 5, 2024 •

edited

Loading

Uh oh!

Chillee commented Jul 6, 2024

Uh oh!

pytorchmergebot commented Jul 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix a bunch of stride issues with FlexAttention #130160

Fix a bunch of stride issues with FlexAttention #130160

Conversation

Chillee commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanboliang Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130160

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Chillee commented Jul 6, 2024

Uh oh!

pytorchmergebot commented Jul 6, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Chillee commented Jul 5, 2024 •

edited

Loading

pytorch-bot bot commented Jul 5, 2024 •

edited

Loading