Skip to content

Conversation

@eternalNight
Copy link
Contributor

The list of torch no-copy ops is hard coded and does not include all operations that may aliasing inputs in their outputs.

Instead of using a fixed list, iterate over all ops under torch.ops.aten and identify those with aliasing behavior by inspecting their schema.

With PyTorch 2.7.1, the default overload of ops identified by the updated logic include:

  • _nested_view_from_buffer
  • _reshape_alias
  • alias
  • as_strided
  • conj
  • detach
  • diagonal
  • expand
  • imag
  • lift_fresh
  • narrow
  • permute
  • pin_memory
  • positive
  • real
  • reshape
  • squeeze
  • t
  • unfold
  • unsqueeze
  • view
  • view_as_complex
  • view_as_real
  • most operations whose name ends with an underscore

@sfc-gh-truwase sfc-gh-truwase requested review from tohtana and removed request for loadams September 15, 2025 15:07
@tohtana
Copy link
Collaborator

tohtana commented Sep 15, 2025

Hi @eternalNight,
I really appreciate your deep understanding of the code. This fix is awesome.
I wonder if we could show a message to say the performance might not be the best when an old PyTorch version is used and ops do have the schema. Can you add it somewhere?

@eternalNight
Copy link
Contributor Author

Hi @eternalNight, I really appreciate your deep understanding of the code. This fix is awesome. I wonder if we could show a message to say the performance might not be the best when an old PyTorch version is used and ops do have the schema. Can you add it somewhere?

Good point. I should have tested the logic on different PyTorch versions.

I just tried Python 3.8.20 with PyTorch 2.0.0, and confirmed that all ops under torch.ops.aten have schema and the logic in this PR works as expected. So older PyTorch versions (not older than 2.0.0 since torch compile is not officially introduced then) should not be a problem.

Speaking of future versions where the schema of some ops is missing, to be conservative we should assume that they MAY reuse tensor storage. I'll update the except case to include such ops as well as print a warning message on its potential performance impact.

@eternalNight eternalNight force-pushed the eternalNight/full_no_copy_ops_list branch from c96ddfc to 9db926e Compare September 16, 2025 04:06
@tohtana
Copy link
Collaborator

tohtana commented Sep 16, 2025

Hi @eternalNight,
Thank you for sharing the insight. As DeepCompile intercepts an internal function of a backward pass, it depends on specific versions of PyTorch. I think we need to consider only >=v2.6.

@tohtana tohtana enabled auto-merge (squash) September 16, 2025 05:11
auto-merge was automatically disabled September 16, 2025 06:29

Head branch was pushed to by a user without write access

@eternalNight eternalNight force-pushed the eternalNight/full_no_copy_ops_list branch from 9db926e to 21cee09 Compare September 16, 2025 06:29
The list of torch no-copy ops is hard coded and does not include all
operations that may aliasing inputs in their outputs.

Instead of using a fixed list, iterate over all ops under torch.ops.aten
and identify those with aliasing behavior by inspecting their schema.
All ops under torch.ops.aten have a schema since PyTorch 2.0.0, but just
in case, ops without a schema are conservatively assumed to have
aliasing behavior and a warning will be printed in such cases.

With PyTorch 2.7.1, the default overload of ops identified by the
updated logic include:

  - _nested_view_from_buffer
  - _reshape_alias
  - alias
  - as_strided
  - conj
  - detach
  - diagonal
  - expand
  - imag
  - lift_fresh
  - narrow
  - permute
  - pin_memory
  - positive
  - real
  - reshape
  - squeeze
  - t
  - unfold
  - unsqueeze
  - view
  - view_as_complex
  - view_as_real
  - most operations whose name ends with an underscore

Signed-off-by: Junjie Mao <[email protected]>
@eternalNight eternalNight force-pushed the eternalNight/full_no_copy_ops_list branch from 21cee09 to de8a258 Compare September 16, 2025 06:31
@tohtana tohtana merged commit 2d84be8 into deepspeedai:master Sep 16, 2025
13 of 14 checks passed
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
The list of torch no-copy ops is hard coded and does not include all
operations that may aliasing inputs in their outputs.

Instead of using a fixed list, iterate over all ops under torch.ops.aten
and identify those with aliasing behavior by inspecting their schema.

With PyTorch 2.7.1, the default overload of ops identified by the
updated logic include:

  - _nested_view_from_buffer
  - _reshape_alias
  - alias
  - as_strided
  - conj
  - detach
  - diagonal
  - expand
  - imag
  - lift_fresh
  - narrow
  - permute
  - pin_memory
  - positive
  - real
  - reshape
  - squeeze
  - t
  - unfold
  - unsqueeze
  - view
  - view_as_complex
  - view_as_real
  - most operations whose name ends with an underscore

Signed-off-by: Junjie Mao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants