-
Notifications
You must be signed in to change notification settings - Fork 563
Introduce XLAGraphExecutor #4270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
80d0cbd to
59cf84d
Compare
|
It looks like upstream has broken our PJRT_DEVICE=CPU python test/dynamo/test_dynamo.py test. This PR passes all the tests last Friday but then starts failing after a rebase. I tested it locally with the same XLA commit but a) last Friday's upstream ToT, and b) today's upstream ToT, and a passes the test while b fails the test. |
JackCaoG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a chance to run resnet50 on TPU with this change? I think this pr does not introduce any functional change but would be good to check no speed regression is introduced.
I will be surprised if it does. Let me do a quick double check. |
|
Here are the results I just ran on tpu v3-8 with PJRT: First row is without the patch, and the second row is with the patch. So no performance difference. |
Summary:
This pull request moves all graph executor related parts out from XLATensor into XLAGraphExecutor such that it matches the format in upstream and therefore makes it easier to inherit the upstream LazyTensor and LazyGraphExecutor later.
A few changes to notice:
5.1. XLATensor(std::shared_ptr data) are made public such that DeviceContextArena::GetLiveTensors can access it.
5.2. XLAGraphExecutor is made to be a friend class of XLATensor in order to access some of the later's private methods/members.
Test Plan:
CI.