DeepCompile: Specify tensor aliasing in C++ op schema #7597

eternalNight · 2025-09-26T08:37:26Z

PyTorch C++ op schema [1] allows specifying tensor storage aliasing by annotating (a) after input/output types. Torch inductor takes this information to determine where to insert explicit del statements for tensors that are no longer needed.

If what an op schema specifies disagrees with the op implementation, inductor-generated code is likely to release tensors earlier than expected and leads to wrong results.

wait_allgather and release_param return the first argument unchanged and that aliasing should be annotated in the schema.

Also remove the code related to clone_custom_op_output as it is solely a workaround of the aforementioned issue.

[1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md

Fixes: #7596

PyTorch C++ op schema [1] allows specifying tensor storage aliasing by annotating `(a)` after input/output types. Torch inductor takes this information to determine where to insert explicit `del` statements for tensors that are no longer needed. If what an op schema specifies disagrees with the op implementation, inductor-generated code is likely to release tensors earlier than expected and leads to wrong results. `wait_allgather` and `release_param` return the first argument unchanged and that aliasing should be annotated in the schema. Also remove the code related to `clone_custom_op_output` as it is solely a workaround of the aforementioned issue. [1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md Fixes: deepspeedai#7596 Signed-off-by: Junjie Mao <[email protected]>

tohtana

Wow, amazing finding! I really appreciate your deep investigation.
Thank you for the great fix!

PyTorch C++ op schema [1] allows specifying tensor storage aliasing by annotating `(a)` after input/output types. Torch inductor takes this information to determine where to insert explicit `del` statements for tensors that are no longer needed. If what an op schema specifies disagrees with the op implementation, inductor-generated code is likely to release tensors earlier than expected and leads to wrong results. `wait_allgather` and `release_param` return the first argument unchanged and that aliasing should be annotated in the schema. Also remove the code related to `clone_custom_op_output` as it is solely a workaround of the aforementioned issue. [1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md Fixes: deepspeedai#7596 Signed-off-by: Junjie Mao <[email protected]>

eternalNight requested review from loadams and tjruwase as code owners September 26, 2025 08:37

This was referenced Sep 26, 2025

[BUG] Schema of DeepCompile C++ ops does not reflect aliases #7596

Closed

DeepCompile: Fuse allgather and downcast #7588

Merged

tohtana approved these changes Sep 28, 2025

View reviewed changes

Merge branch 'master' into eternalNight/correct_ops_schema

6f5f3ba

tohtana enabled auto-merge (squash) September 29, 2025 02:19

tohtana merged commit 6fcccfa into deepspeedai:master Sep 29, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DeepCompile: Specify tensor aliasing in C++ op schema #7597

DeepCompile: Specify tensor aliasing in C++ op schema #7597

Uh oh!

eternalNight commented Sep 26, 2025

Uh oh!

tohtana left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DeepCompile: Specify tensor aliasing in C++ op schema #7597

DeepCompile: Specify tensor aliasing in C++ op schema #7597

Uh oh!

Conversation

eternalNight commented Sep 26, 2025

Uh oh!

tohtana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants