Fix a number of flexattention issues (cse, cudagraph, etc.) #145059

Chillee · 2025-01-17T08:13:40Z

Stack from ghstack (oldest at bottom):

-> Fix a number of flexattention issues (cse, cudagraph, etc.) #145059

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-01-17T08:13:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145059

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1b78c80 with merge base 8d91bfd ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

torch/_inductor/dtype_propagation.py

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

drisspg · 2025-01-27T19:19:47Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

    // do nothing
  } else if (indices_batched && (self_bdim.has_value() || values_bdim.has_value())) {
-    auto arange_index = at::arange(0, batch_size);
+    auto arange_index = at::arange(batch_size, options.dtype(kLong));


was this borked on gpu?

This is the cause for why bm creation wasn't cudagraphable (#143872), and also probably why fixing compilation of torch.compile made such a huge difference haha.

Basically, aten.index actually works fine if the indices are on different devices - it just does an implicit DtoH copy. But obviously it breaks cudagraphs

drisspg · 2025-01-27T19:20:58Z

torch/_inductor/codegen/triton.py

-            not config.triton.codegen_upcast_to_fp32
-            and isinstance(var, CSEVariable)
-            and var.dtype in (torch.float16, torch.bfloat16)
+        return isinstance(var, CSEVariable) and var.dtype in (


did you remove the config?

I don't think it makes sense to gate this upcast on this config. This needs_upcast is needed for correctness (in the case that we don't upcast everything immediately to fp32), but the original PR added this presumably to limit the blast radius of the potential change.

cc @eellison

This is fine. Previously it was gated because there was no situation we expected to see bf16 vars but there's no reason we need the gate. In the general case we should be upcasting before libdevice ops which dont support bf16.

I actually ended up reverting this change because it is actually also used for downcasting.

I don't follow , would you say more ? For ops which dont support low precision, it should upcast, run op, then downcast after.

drisspg · 2025-01-27T19:21:37Z

torch/_inductor/dtype_propagation.py

    for arg in args:
-        if isinstance(arg, str):
-            # TODO: fix the flex attention instances, enable internally
-            if not config.is_fbcode():


are there other fbcode issues this will pop now?

Not sure, we'll see!

torch/_inductor/kernel/flex_attention.py

drisspg

Lots of little nits

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 973ade1 Pull Request resolved: #145059

Chillee · 2025-01-29T17:18:14Z

@pytorchbot merge

pytorchmergebot · 2025-01-29T17:19:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix a number of flexattention issues (cse, cudagraph, etc.)

78f7bc5

[ghstack-poisoned]

Chillee requested review from albanD, jbschlosser and mikaylagawarecki as code owners January 17, 2025 08:13

pytorch-bot bot added ciflow/inductor module: inductor labels Jan 17, 2025

Chillee mentioned this pull request Jan 17, 2025

Moved .all() checks for distributions to _is_all_true #145029

Closed

Chillee mentioned this pull request Jan 17, 2025

Made partitioning more(?) deterministic #145024

Closed

github-actions bot requested a review from ezyang January 17, 2025 08:13

Skylion007 approved these changes Jan 17, 2025

View reviewed changes

torch/_inductor/dtype_propagation.py Outdated Show resolved Hide resolved

Chillee added the topic: not user facing topic category label Jan 17, 2025

albanD removed their request for review January 22, 2025 21:58

Chillee requested review from drisspg and yanboliang January 27, 2025 19:10

drisspg reviewed Jan 27, 2025

View reviewed changes

torch/_inductor/kernel/flex_attention.py Show resolved Hide resolved

drisspg approved these changes Jan 27, 2025

View reviewed changes

Chillee added a commit that referenced this pull request Jan 29, 2025

Fix a number of flexattention issues (cse, cudagraph, etc.)

d49972e

ghstack-source-id: 973ade1 Pull Request resolved: #145059

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 29, 2025

pytorchmergebot added the merging label Jan 29, 2025

pytorchmergebot added the Merged label Jan 29, 2025

pytorchmergebot closed this in 2d5d022 Jan 29, 2025

pytorchmergebot removed the merging label Jan 29, 2025

michael-diggin mentioned this pull request Jan 30, 2025

FlexAttention errors with certain functions and half precision in score_mod #144869

Closed

github-actions bot deleted the gh/chillee/379/head branch March 1, 2025 02:11

yukiy927 mentioned this pull request Sep 9, 2025

feat: Add FlexAttention Backend for Efficient Sparse Attention sgl-project/sglang#9947

Merged

4 tasks

Fix a number of flexattention issues (cse, cudagraph, etc.) #145059

Fix a number of flexattention issues (cse, cudagraph, etc.) #145059

Uh oh!

Conversation

Chillee commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145059

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

Chillee commented Jan 29, 2025

Uh oh!

pytorchmergebot commented Jan 29, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Chillee commented Jan 17, 2025 •

edited

Loading

pytorch-bot bot commented Jan 17, 2025 •

edited

Loading