Skip to content

Conversation

@fffrog
Copy link
Collaborator

@fffrog fffrog commented Sep 24, 2024

- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136519

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3d68218 with merge base 7f88bf9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: mps Release notes category labels Sep 24, 2024
@fffrog fffrog added topic: not user facing topic category ciflow/xpu Run XPU CI tasks labels Sep 24, 2024
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

[ghstack-poisoned]
@fffrog fffrog changed the title Make Context to be Device-agnostic Make Context to be Device-agnostic Step by Step (1/N) Sep 24, 2024
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

[ghstack-poisoned]
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

[ghstack-poisoned]
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

[ghstack-poisoned]
@fffrog fffrog marked this pull request as ready for review September 25, 2024 01:28
@fffrog fffrog removed the request for review from kulinseth September 25, 2024 01:28
@clee2000
Copy link
Contributor

@pytorchbot revert -m "breaking internal tests related to MITA, @ezyang has a forward fix?" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 15, 2024
pytorchmergebot added a commit that referenced this pull request Oct 15, 2024
@pytorchmergebot
Copy link
Collaborator

@fffrog your PR has been successfully reverted.

@ezyang
Copy link
Contributor

ezyang commented Oct 15, 2024

I'm going to help, it's just one small problem now

@ezyang
Copy link
Contributor

ezyang commented Oct 15, 2024

I'm going to attempt to fix this directly, but a way to fix this from first principles in OSS, is to not rename any of the virtual methods on the hooks classes, and have a new implementation of init() in the subclass interfaces we let people subclass from that call the old method by default. What is done in this PR is cleaner but it's BC breaking and concretely what I have to do is rename initMTIA to init internally.

KnAwnime pushed a commit to KnAwnime/Biblioteka that referenced this pull request Oct 16, 2024
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

ghstack-source-id: 3cb7e64
Pull Request resolved: pytorch/pytorch#136519
@fffrog
Copy link
Collaborator Author

fffrog commented Oct 16, 2024

My principle is to make Context.h and DeviceInterface related files clearer and easier to understand, but the changes in the current pr do introduce BC breaking, which should not be the case for a widely used project like PyTorch.

a way to fix this from first principles in OSS, is to not rename any of the virtual methods on the hooks classes

I agree with your point of view, so I updated the PR and made the following changes:

  • For the in-tree backend, the implementation of device initialization is moved to the newly added init(), and the call of the old interface initCUDA() will be forwarded to init, while ensuring the code logic is as clear and usable as possible, and BC is also guaranteed
  • For the out-of-tree backend, in order to ensure BC as much as possible, no changes are made to initMTIA(), but it is forwarded to initMITA() in the new interface init(), and a new deprecation prompt message is added

[ghstack-poisoned]
[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Oct 16, 2024

It turns out that the BC fix is pretty easy. I'm going to land my own copy as I need to also land an fbcode change

@ezyang
Copy link
Contributor

ezyang commented Oct 16, 2024

ezyang added a commit to ezyang/pytorch that referenced this pull request Oct 17, 2024
Summary:
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Original pull request: pytorch#136519

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4a8e49389c33934234dc89616fd17a58e760e2e7

Reviewed By: malfet

Differential Revision: D64471142
pytorchmergebot pushed a commit that referenced this pull request Oct 17, 2024
)

Summary:
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Original pull request: #136519

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4a8e49389c33934234dc89616fd17a58e760e2e7

Reviewed By: malfet

Differential Revision: D64471142

Pull Request resolved: #138155
Approved by: https://github.com/malfet, https://github.com/bobrenjc93
@fffrog
Copy link
Collaborator Author

fffrog commented Oct 18, 2024

@ezyang I will close this PR because the PR has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: mps Release notes category Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.