Restore real inputs for recompilation #7356

tohtana · 2025-06-14T02:15:16Z

This PR keeps some of real inputs given to the custom backend for DeepCompile.

DeepCompile expects that the custom backend at TorchFX graph level is always called when recompilation happens. In some cases, however, only the Aten-level backend is called. As the Aten-level backend uses real inputs saved by TorchFX-level backend, we need to keep the real inputs for recompilation.

Currently we discard the real inputs after the Aten-level backend uses it as the real inputs are often too large to keep in GPU memory. This causes an error in cases where recompilation only calls Aten-level backends because we don't have a chance to record new real inputs in TorchFX-level backend.

This PR always keeps only tensor metadata and non-tensor data on CPU and materialize the tensors when needed (i.e. when recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage.
The new module InputStorage keeps tensor metadata and non-tensor data for this purpose and materialize tensors.

Signed-off-by: Masahiro Tanaka <[email protected]>

sfc-gh-truwase · 2025-06-14T22:55:13Z

recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage.

@tohtana, what are the known failure cases of this solution?

tohtana · 2025-06-15T01:04:02Z

@tohtana, what are the known failure cases of this solution?

@sfc-gh-truwase I can think of these two cases:

Operators that take indices (e.g. embedding, scatter): They will throw an error if indices that exceed the target tensor size. In this PR, we fill the dummy value with 1 (like ones). Perhaps embedding should be okay for most of cases. If we encounter this issue, it would be good to add an option to control more detail behaviors (e.g. saving only int tensors expecting large activation tensors are mostly float)
torch.where, torch.nonzero: Output shapes of these operators can change depending on the input. We will need a different approach to address this (e.g. gather inputs from all ranks and run through them until we get a stable graph)

sfc-gh-truwase · 2025-06-17T18:39:49Z

@sfc-gh-truwase I can think of these two cases:

Thanks for the explanation. I have two thoughts:

My understanding is that these limitations are due to runtime properties of execution as opposed to static properties of the code. If so, then they are normal limitations of compiler approach. In that case, we might want to skip compilation for these case as opposed to silent failures.
If my understanding is incorrect, then can we document these limitations in the code until we are able to handle them?

tohtana · 2025-06-17T20:55:41Z

@sfc-gh-truwase, Thank you for your comment!
I currently don't think the dummy values affect correctness as they are used only for profiling.

For two cases I mentioned, the followings can happen:

Operators that take indices (e.g. embedding, scatter): If an invalid value is given, they throw an error.
Operator that produces dynamic shapes depending on inputs (torch.where, torch.nonzero): The output might cause an error with following operators during profiling, or inaccurate profiling results in non-optimal graph modification.

An alternative approach is to offload and keep real inputs. As there is a tradeoff between CPU memory consumption and stability/accuracy of profiling, we could give the user the choice.

Signed-off-by: Masahiro Tanaka <[email protected]>

tohtana · 2025-06-19T01:57:35Z

@sfc-gh-truwase Thanks for the feedback! I've extended the InputStorage to cover wider variety of scenarios.

Here is the summary:

Enhanced InputStorage to keep real values for integer tensors by default:
- Added keep_int_input_tensors: bool = True config option (enabled by default)
- Integer tensors (indices, masks, etc.) now preserve their actual values instead of using dummy ones
- This addresses correctness issues with operators like embedding and scatter that rely on valid indices
Added option to keep all input tensors:
- Added keep_all_input_tensors: bool = False config option for comprehensive tensor preservation
- Useful for debugging or cases where dummy values cause issues with any tensor type

For both options, we offload the real tensor to the host memory.

This PR keeps some of real inputs given to the custom backend for DeepCompile. DeepCompile expects that the custom backend at TorchFX graph level is always called when recompilation happens. In some cases, however, only the Aten-level backend is called. As the Aten-level backend uses real inputs saved by TorchFX-level backend, we need to keep the real inputs for recompilation. Currently we discard the real inputs after the Aten-level backend uses it as the real inputs are often too large to keep in GPU memory. This causes an error in cases where recompilation only calls Aten-level backends because we don't have a chance to record new real inputs in TorchFX-level backend. This PR always keeps only tensor metadata and non-tensor data on CPU and materialize the tensors when needed (i.e. when recompilation happens and only Aten-level backends are called without real inputs). As we use dummy data to materialize tensors, this solution might still not work but improves the coverage. The new module `InputStorage` keeps tensor metadata and non-tensor data for this purpose and materialize tensors. --------- Signed-off-by: Masahiro Tanaka <[email protected]>

keep real inputs for partial recompilation

5921295

Signed-off-by: Masahiro Tanaka <[email protected]>

tohtana requested review from loadams and tjruwase as code owners June 14, 2025 02:15

tohtana and others added 2 commits June 13, 2025 19:16

Merge branch 'master' into tohtana/keep_real_inputs_for_recompile

ab9aad5

fix format

a7897cf

Signed-off-by: Masahiro Tanaka <[email protected]>

Masahiro Tanaka and others added 2 commits June 19, 2025 01:49

add options to keep the real input to backend

099027a

Signed-off-by: Masahiro Tanaka <[email protected]>

Merge branch 'master' into tohtana/keep_real_inputs_for_recompile

b96adaa

tjruwase approved these changes Jun 19, 2025

View reviewed changes

tjruwase merged commit 6f1a1c0 into master Jun 19, 2025
9 of 10 checks passed

tjruwase deleted the tohtana/keep_real_inputs_for_recompile branch June 19, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restore real inputs for recompilation #7356

Restore real inputs for recompilation #7356

Uh oh!

tohtana commented Jun 14, 2025

Uh oh!

sfc-gh-truwase commented Jun 14, 2025

Uh oh!

tohtana commented Jun 15, 2025 •

edited

Loading

Uh oh!

sfc-gh-truwase commented Jun 17, 2025

Uh oh!

tohtana commented Jun 17, 2025

Uh oh!

tohtana commented Jun 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Restore real inputs for recompilation #7356

Restore real inputs for recompilation #7356

Uh oh!

Conversation

tohtana commented Jun 14, 2025

Uh oh!

sfc-gh-truwase commented Jun 14, 2025

Uh oh!

tohtana commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-truwase commented Jun 17, 2025

Uh oh!

tohtana commented Jun 17, 2025

Uh oh!

tohtana commented Jun 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tohtana commented Jun 15, 2025 •

edited

Loading