-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[FlexAttention] make bm creation cuda-graphable #143872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143872
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 516bd4f with merge base a174ee2 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Why is this not cudagraphable? 🤔 |
|
Yeah, this is the wrong way to solve this problem. The issue is in vmap. |
|
I think we're missing a .sym_sizes call somewhere in the C++ implementation for vmap. @drisspg can you get a C++ stack trace in debug mode? (so we can see function names) |
|
Fixed in internals of vmap |
Stack from ghstack (oldest at bottom):
Summary
Addresses: #143840
Current dynamic failing test: test/inductor/test_flex_attention.py::TestBlockMask::test_compiling_create_block_mask_no_recompile - torch.dynamo.exc.TorchRuntimeError: Failed running call_method scatter(*(BatchedTensor(lvl=2,...
with
CC @zou3519 for ideas on why this failing
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @Chillee @yanboliang @BoyuanFeng