-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Restore real inputs for recompilation #7356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
@tohtana, what are the known failure cases of this solution? |
@sfc-gh-truwase I can think of these two cases:
|
Thanks for the explanation. I have two thoughts:
|
|
@sfc-gh-truwase, Thank you for your comment! For two cases I mentioned, the followings can happen:
An alternative approach is to offload and keep real inputs. As there is a tradeoff between CPU memory consumption and stability/accuracy of profiling, we could give the user the choice. |
Signed-off-by: Masahiro Tanaka <[email protected]>
|
@sfc-gh-truwase Thanks for the feedback! I've extended the InputStorage to cover wider variety of scenarios. Here is the summary:
For both options, we offload the real tensor to the host memory. |
This PR keeps some of real inputs given to the custom backend for DeepCompile. DeepCompile expects that the custom backend at TorchFX graph level is always called when recompilation happens. In some cases, however, only the Aten-level backend is called. As the Aten-level backend uses real inputs saved by TorchFX-level backend, we need to keep the real inputs for recompilation. Currently we discard the real inputs after the Aten-level backend uses it as the real inputs are often too large to keep in GPU memory. This causes an error in cases where recompilation only calls Aten-level backends because we don't have a chance to record new real inputs in TorchFX-level backend. This PR always keeps only tensor metadata and non-tensor data on CPU and materialize the tensors when needed (i.e. when recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage. The new module `InputStorage` keeps tensor metadata and non-tensor data for this purpose and materialize tensors. --------- Signed-off-by: Masahiro Tanaka <[email protected]>
This PR keeps some of real inputs given to the custom backend for DeepCompile. DeepCompile expects that the custom backend at TorchFX graph level is always called when recompilation happens. In some cases, however, only the Aten-level backend is called. As the Aten-level backend uses real inputs saved by TorchFX-level backend, we need to keep the real inputs for recompilation. Currently we discard the real inputs after the Aten-level backend uses it as the real inputs are often too large to keep in GPU memory. This causes an error in cases where recompilation only calls Aten-level backends because we don't have a chance to record new real inputs in TorchFX-level backend. This PR always keeps only tensor metadata and non-tensor data on CPU and materialize the tensors when needed (i.e. when recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage. The new module `InputStorage` keeps tensor metadata and non-tensor data for this purpose and materialize tensors. --------- Signed-off-by: Masahiro Tanaka <[email protected]>
This PR keeps some of real inputs given to the custom backend for DeepCompile.
DeepCompile expects that the custom backend at TorchFX graph level is always called when recompilation happens. In some cases, however, only the Aten-level backend is called. As the Aten-level backend uses real inputs saved by TorchFX-level backend, we need to keep the real inputs for recompilation.
Currently we discard the real inputs after the Aten-level backend uses it as the real inputs are often too large to keep in GPU memory. This causes an error in cases where recompilation only calls Aten-level backends because we don't have a chance to record new real inputs in TorchFX-level backend.
This PR always keeps only tensor metadata and non-tensor data on CPU and materialize the tensors when needed (i.e. when recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage.
The new module
InputStoragekeeps tensor metadata and non-tensor data for this purpose and materialize tensors.