Skip to content

Conversation

@IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Oct 23, 2024

@IvanKobzarev IvanKobzarev requested a review from bdhirsh as a code owner October 23, 2024 19:04
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138731

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c46cf0c with merge base 10a34dc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

IvanKobzarev added a commit that referenced this pull request Oct 24, 2024
ghstack-source-id: 6930589
Pull Request resolved: #138731
return x

is_subclass: bool = is_traceable_wrapper_subclass(x)
mem_format = memory_format[0] if is_subclass else memory_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now we have two separate functions for doing "coercion" of tangents:

(1) coerce non-contiguous tangents to be contiguous

(2) coerce tangents with incorrect subclass metadata to have the right subclass metadata

It seems strange to add the logic for handling AsyncCollectiveTensor to the first one. So I would rather put it in the second (which might require tweaking the contiguous() coercion to run later), or in the worst case, merging the two pieces of coercion code.

memory_format_for_dense_tensor = not isinstance(memory_format, list)
if isinstance(x, AsyncCollectiveTensor) and memory_format_for_dense_tensor:
# This is AsyncCollectiveTensor, that we have not seen during tracing time.
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think it's possible to end up with an AsyncCollectiveTensor(AsyncCollectiveTensor(...) (that would probably a bug with the subclass itself). So I'd rather just not have a while loop here. Is there a place where you saw this show up?

@Chillee
Copy link
Collaborator

Chillee commented Oct 24, 2024

I don't really like that we need to shove all of this stuff into AOTAutograd. Can we register this with the subclass in some manner?

mem_format = memory_format
if is_subclass:
memory_format_for_dense_tensor = not isinstance(memory_format, list)
if isinstance(x, AsyncCollectiveTensor) and memory_format_for_dense_tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you do the refactor to merge all of the tangent coercion logic into one function, you should be able to use CompiledFunction.metadata.traced_tangent_metas to know when you need to do a wait() call here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should use subclass_type information

ga, gb, gc = torch.autograd.grad(out_buffer.sum(), (a, b, c))

def test_unwrap_async_collective_tensor_tangent(self):
class WrapSC(torch.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's starting to feel like we have a lot of one-off subclasses across our tests. Any reason not to just re-use TwoTensor here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's starting to feel like we have a lot of one-off subclasses across our tests. Any reason not to just re-use TwoTensor here?

I was thinking to extract it to some utils module. Will try to use TwoTensor, I think it should work, I will try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TwoTensor works, removed redundant subclass in the test.

Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock Andrew, agreed offline that we should do a refactor to merge all of the tangent coercion logic into one place (it's a bit weird to be doing AsyncCollectiveTensor unwrapping right now in the code that is supposed to make tangents contiguous)

IvanKobzarev added a commit that referenced this pull request Oct 24, 2024
ghstack-source-id: 8464402
Pull Request resolved: #138731
@IvanKobzarev
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@IvanKobzarev
Copy link
Contributor Author

@pytorchbot merge -f "skip unrelated bazel build failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 29, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 30, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 30, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 1, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 1, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

Abandoning in favor of #139095

IvanKobzarev added a commit that referenced this pull request Nov 6, 2024
…h expected_type for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 6, 2024
…for e.g.AsyncCollectiveTensor"


Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case 
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.


cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 7, 2024
…ncCollectiveTensor (#139095)

Based on discussion here: #138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.

Pull Request resolved: #139095
Approved by: https://github.com/bdhirsh
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…ncCollectiveTensor (pytorch#139095)

Based on discussion here: pytorch#138731

Introducing ability for subclass implement type convertion to expected_type.
```
    def __coerce_same_metadata_as_tangent__(
        self, expected_metadata: Any, expected_type: Optional[Type] = None
    ):
```
Here if `expected_type=None` means `SubclassClass` is expected.

E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case
`expected_type=Tensor` will be called during runtime

Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`.

Pull Request resolved: pytorch#139095
Approved by: https://github.com/bdhirsh
@github-actions github-actions bot deleted the gh/IvanKobzarev/79/head branch December 6, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants