[CP] Introduce flex_cp_forward custom op for FlexAttention CP #163185

fegin · 2025-09-17T19:55:11Z

Stack from ghstack (oldest at bottom):

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd.

For the next step, we should explore how to interpolate the required communication based on the information from BlockMask.

cc @H-Huang @awgu @wanchaol @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

[ghstack-poisoned]

pytorch-bot · 2025-09-17T19:55:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163185

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm][CI] Machines under the label linux.rocm.gpu.2 are undergoing maintenance.

✅ No Failures

As of commit db5a09d with merge base d41aa18 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: a2547aa Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: 26feaeb Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: d5259e1 Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: 453b2e8 Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: 0d3a95e Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: 9b8c540 Pull-Request-resolved: #163185

[ghstack-poisoned]

The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. ghstack-source-id: 4b720b1 Pull-Request-resolved: #163185

[ghstack-poisoned]

fegin · 2025-10-13T06:57:07Z

@pytorchbot merge

pytorchmergebot · 2025-10-13T06:58:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-13T08:39:46Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-10-13T08:41:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-13T09:40:10Z

Merge failed

Reason: Comment with id 3396438666 not found

Details for Dev Infra team

Raised by workflow job

jeffdaily · 2025-10-13T17:07:58Z

@pytorchbot merge

pytorchmergebot · 2025-10-13T17:10:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…5039) No logic change, just polish the docstrings, comments and remove unused variables Pull Request resolved: #165039 Approved by: https://github.com/XilunWu ghstack dependencies: #162542, #164500, #163185

…orch#165039) No logic change, just polish the docstrings, comments and remove unused variables Pull Request resolved: pytorch#165039 Approved by: https://github.com/XilunWu ghstack dependencies: pytorch#162542, pytorch#164500, pytorch#163185

…h#163185) The custom op will fetch the required K and V. Currently, the forward pass is just an all-gather, and the backward pass is a reduce-scatter. While the logic is the same as all_gather_tensor_autograd, the custom op avoids the Autograd warning that wait_tensor() is registered to autograd. For the next step, we should explore how to interpolate the required communication based on the information from BlockMask. Pull Request resolved: pytorch#163185 Approved by: https://github.com/XilunWu ghstack dependencies: pytorch#162542, pytorch#164500

…orch#165039) No logic change, just polish the docstrings, comments and remove unused variables Pull Request resolved: pytorch#165039 Approved by: https://github.com/XilunWu ghstack dependencies: pytorch#162542, pytorch#164500, pytorch#163185

Update

e21d3b2

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 17, 2025

This was referenced Sep 17, 2025

[CP][BE] Remove _AttentionContextParallel #162539

Closed

[CP] Remove the need of recording cp_dim in the global var #162540

Closed

This was referenced Sep 17, 2025

[CP][BE] Correct the names of some tests #162541

Closed

[CP][BE] Cosmetic refactors for CP code base #163115

Closed

[CP][BE] Correct an incorrect docstring #163131

Closed

[CP]Introduce ContextParallal plan for parallelize_module() #162542

Closed

fegin added topic: not user facing topic category module: context parallel PyTorch Context Parallel release notes: context parallel and removed topic: not user facing topic category labels Sep 17, 2025

Update

96f7826

[ghstack-poisoned]

fegin mentioned this pull request Sep 18, 2025

[CP] Fix cuDNN CP LSE dimension bug #163231

Closed

Update

20fd1f4

[ghstack-poisoned]

fegin mentioned this pull request Sep 19, 2025

PP and Context Parallel composability issue -- forward hooks pytorch/torchtitan#1728

Open

Update

17e118a

[ghstack-poisoned]

fegin mentioned this pull request Sep 29, 2025

[PP] Let PP split BlockMask into micro-BlockMask #164111

Closed

Update

15b14aa

[ghstack-poisoned]

fegin mentioned this pull request Oct 2, 2025

[CP] Replace context_parallel context manager with functional APIs #164500

Closed

Update

2db71a4

[ghstack-poisoned]

fegin requested a review from XilunWu October 3, 2025 07:05

Update

114b7c0

[ghstack-poisoned]

Update

08ab68b

[ghstack-poisoned]

fegin mentioned this pull request Oct 9, 2025

[CP][BE] Docstrings, comments polish and remove unused variables #165039

Closed

fegin added 6 commits October 9, 2025 13:30

Update

05d7ca0

[ghstack-poisoned]

Update

335b03e

[ghstack-poisoned]

Update

786ec3c

[ghstack-poisoned]

Update

a9c889a

[ghstack-poisoned]

Update

d3a646b

[ghstack-poisoned]

Update

db5a09d

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 13, 2025

pytorchmergebot added the merging label Oct 13, 2025

pytorchmergebot removed the merging label Oct 13, 2025

pytorchmergebot added the merging label Oct 13, 2025

pytorchmergebot added the Merged label Oct 13, 2025

pytorchmergebot closed this in e93343c Oct 13, 2025

pytorchmergebot removed the merging label Oct 13, 2025

github-actions bot deleted the gh/fegin/321/head branch November 13, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CP] Introduce flex_cp_forward custom op for FlexAttention CP #163185

[CP] Introduce flex_cp_forward custom op for FlexAttention CP #163185

Uh oh!

fegin commented Sep 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

fegin commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

jeffdaily commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[CP] Introduce flex_cp_forward custom op for FlexAttention CP #163185

[CP] Introduce flex_cp_forward custom op for FlexAttention CP #163185

Uh oh!

Conversation

fegin commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163185

❗ 1 Active SEVs

✅ No Failures

Uh oh!

fegin commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge failed

Uh oh!

jeffdaily commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fegin commented Sep 17, 2025 •

edited

Loading

pytorch-bot bot commented Sep 17, 2025 •

edited

Loading