Skip to content

Conversation

@Chillee
Copy link
Collaborator

@Chillee Chillee commented Jan 17, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145059

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1b78c80 with merge base 8d91bfd (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
@Chillee Chillee added the topic: not user facing topic category label Jan 17, 2025
@albanD albanD removed their request for review January 22, 2025 21:58
@Chillee Chillee requested review from drisspg and yanboliang January 27, 2025 19:10
// do nothing
} else if (indices_batched && (self_bdim.has_value() || values_bdim.has_value())) {
auto arange_index = at::arange(0, batch_size);
auto arange_index = at::arange(batch_size, options.dtype(kLong));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this borked on gpu?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cause for why bm creation wasn't cudagraphable (#143872), and also probably why fixing compilation of torch.compile made such a huge difference haha.

Basically, aten.index actually works fine if the indices are on different devices - it just does an implicit DtoH copy. But obviously it breaks cudagraphs

not config.triton.codegen_upcast_to_fp32
and isinstance(var, CSEVariable)
and var.dtype in (torch.float16, torch.bfloat16)
return isinstance(var, CSEVariable) and var.dtype in (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you remove the config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to gate this upcast on this config. This needs_upcast is needed for correctness (in the case that we don't upcast everything immediately to fp32), but the original PR added this presumably to limit the blast radius of the potential change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Previously it was gated because there was no situation we expected to see bf16 vars but there's no reason we need the gate. In the general case we should be upcasting before libdevice ops which dont support bf16.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually ended up reverting this change because it is actually also used for downcasting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow , would you say more ? For ops which dont support low precision, it should upcast, run op, then downcast after.

for arg in args:
if isinstance(arg, str):
# TODO: fix the flex attention instances, enable internally
if not config.is_fbcode():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other fbcode issues this will pop now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, we'll see!

Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of little nits

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Chillee added a commit that referenced this pull request Jan 29, 2025
@Chillee
Copy link
Collaborator Author

Chillee commented Jan 29, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 29, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants