Skip to content

Conversation

@eternalNight
Copy link
Contributor

PyTorch C++ op schema [1] allows specifying tensor storage aliasing by annotating (a) after input/output types. Torch inductor takes this information to determine where to insert explicit del statements for tensors that are no longer needed.

If what an op schema specifies disagrees with the op implementation, inductor-generated code is likely to release tensors earlier than expected and leads to wrong results.

wait_allgather and release_param return the first argument unchanged and that aliasing should be annotated in the schema.

Also remove the code related to clone_custom_op_output as it is solely a workaround of the aforementioned issue.

[1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md

Fixes: #7596

PyTorch C++ op schema [1] allows specifying tensor storage aliasing by
annotating `(a)` after input/output types. Torch inductor takes this
information to determine where to insert explicit `del` statements for
tensors that are no longer needed.

If what an op schema specifies disagrees with the op implementation,
inductor-generated code is likely to release tensors earlier than
expected and leads to wrong results.

`wait_allgather` and `release_param` return the first argument unchanged
and that aliasing should be annotated in the schema.

Also remove the code related to `clone_custom_op_output` as it is solely
a workaround of the aforementioned issue.

[1] https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md

Fixes: deepspeedai#7596
Signed-off-by: Junjie Mao <[email protected]>
Copy link
Collaborator

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, amazing finding! I really appreciate your deep investigation.
Thank you for the great fix!

@tohtana tohtana enabled auto-merge (squash) September 29, 2025 02:19
@tohtana tohtana merged commit 6fcccfa into deepspeedai:master Sep 29, 2025
12 checks passed
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
PyTorch C++ op schema [1] allows specifying tensor storage aliasing by
annotating `(a)` after input/output types. Torch inductor takes this
information to determine where to insert explicit `del` statements for
tensors that are no longer needed.

If what an op schema specifies disagrees with the op implementation,
inductor-generated code is likely to release tensors earlier than
expected and leads to wrong results.

`wait_allgather` and `release_param` return the first argument unchanged
and that aliasing should be annotated in the schema.

Also remove the code related to `clone_custom_op_output` as it is solely
a workaround of the aforementioned issue.

[1]
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md

Fixes: deepspeedai#7596

Signed-off-by: Junjie Mao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Schema of DeepCompile C++ ops does not reflect aliases

2 participants