[REFACTOR] Inline FxGraphCache.post_compile into sole call site #141877

ezyang · 2024-12-02T15:33:52Z

Stack from ghstack (oldest at bottom):

I am going to break apart the arguments passed to the constituents
to only pass exactly what is needed, so easy access to the insides
is helpful here.

This also moves two helper functions to output_code.py as well.

Also set _boxed_call at constructor.

Signed-off-by: Edward Z. Yang [email protected]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-12-02T15:33:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141877

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 6fb06f0 with merge base 7c1d5db ():

NEW FAILURE - The following job has failed:

torchbench / cuda12.1-py3.10-gcc9-sm80 / test (torchbench_gcp_smoketest, 1, 1, awsa100.linux.gcp.a100) (gh)
test_sam_fast_eval_cuda

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu · 2024-12-02T16:01:19Z

torch/_functorch/_aot_autograd/autograd_cache.py

-        result._boxed_call = True
+        # TODO: How come cudagraphs could be None here?
+        # TODO: How come gm is None here?
+        result.post_compile(example_inputs, fx_config["cudagraphs"], None)  # type: ignore[arg-type]


gm is only used for @masnesral's specific freezing use case. In that case, we need the post compile gm to do things. But we don't have that for AOTAutogradCache, so AOTAutogradCache doesn't support the freezing workflow. It always passes in None here since the gm should never be needed for post compile.

(The only "gm" it has access to at this step is the post dynamo gm. So anything needed to load from FXGraphCache needs to be obtainable from that)

Hmm, are we talking about the same code? I see here in post_compile:

if cudagraphs: # It's possible that cudagraphs is enabled, but was disabled # during a previous compilation we're loading from the cache. # If so, we need to disable it on this new process too. if self.disabled_cudagraphs_reason: if "cuda" in self.device_types: log_cudagraph_skip_and_bump_counter( f"skipping cudagraphs due to {self.disabled_cudagraphs_reason}" ) else: counters["inductor"]["cudagraph_skips"] += 1 BoxedBool.disable(cudagraphs) else: cudagraph_post_compile( example_inputs, self, cudagraphs, gm, )

So gm is needed to do some CUDA graphs stuff. I don't see how AOTAutograd cache proves that cudagraphs could never be enabled here. I can try to write a test case that tickles it... later, maybe.

Ah yeah, it's confusing because cudagraphs_post_compile only uses gm if freezing was enabled on the cache write, and CompiledFxGraph doesn't have the constants on hand already (which it always will in AOTAutogradCache scenarios)

IMO we should change this so the gm is not passed to cudagraphs_post_compile: only the constants themselves, and instead have whatever caller to cudagraphs_post_compile handle obtaining the constants from the gm.

The gm field is really confusing to work with because it is essentially untyped: it can represent both a dynamo gm and a post aotdispatch gm.

@masnesral It's really confusing what's happening here so I might refactor it some more: we shouldn't call CompoledFxGraph.get_constants() with an optional gm and change its behavior depending on some unspoken invariants. We should only call this helper function if we know freezing is on in the cache entry and we're not coming from AOTautogradcache.

The way it's written now there's no way to tell that this gm is always None from the exact code path where it's precisely correct, and a correct value otherwise

jamesjwu · 2024-12-02T16:06:24Z

torch/_inductor/output_code.py

-        FxGraphCache.post_compile(self, example_inputs, cudagraphs, gm)
+        This runs whether or not we have a cache hit, and always runs directly after we get a CompiledFxGraph.


I see, this refactor is nice because OutputGraph is at a layer higher than a cache, so it can do the post compilation unconditionally. Without OutputCode, this refactor would be hard to write because there would be two different ways you could obtain all the needed inputs: through compile or through cache hit.

That said, maybe I'm missing context from the last stack but I don't remember how the deserialize step converts the stuff saved into the cache to an outputcode. I assume we still need something like that, to massage the cached result into the object we return from inductor.

Actually, you could have always done this refactor, I just forgot to update this call site when I initially introduced post_compile. It's not doing much.

That said, maybe I'm missing context from the last stack but I don't remember how the deserialize step converts the stuff saved into the cache to an outputcode. I assume we still need something like that, to massage the cached result into the object we return from inductor.

I'm not exactly sure what your question here is. But CompiledFxGraph is directly unpickled from the cache:

def iterate_over_candidates() -> Generator[CompiledFxGraph, None, None]: if local: subdir = FxGraphCache._get_tmp_dir_for_key(key) if os.path.exists(subdir): for path in sorted(os.listdir(subdir)): try: with open(os.path.join(subdir, path), "rb") as f: yield pickle.load(f) except Exception: log.warning( "fx graph cache unable to load compiled graph", exc_info=True, )

And, contrary to expectation, the current_callable is directly bong'ed in by codecache.py

try: with dynamo_timed( "PyCodeCache.load_by_key_path", log_pt2_compile_event=True ): graph.current_callable = PyCodeCache.load_by_key_path( graph.cache_key, artifact_path, graph.cache_linemap, graph.get_constants(gm), ).call

This part I do plan to refactor as well but I haven't gotten to it yet.

ezyang · 2024-12-02T17:38:57Z

@pytorchbot merge

pytorchmergebot · 2024-12-02T17:40:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…se class (#141878) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #141878 Approved by: https://github.com/jamesjwu ghstack dependencies: #141877

jamesjwu · 2024-12-03T19:03:06Z

Double checked that the tests that were failing are now passing, along with TestAOTAutogradWithCache which was failing in the beginning. Going to attempt one more time to land this.

python test/functorch/test_aotdispatch.py TestAOTAutogradWithCache -v

@pytorchbot merge -i

pytorchmergebot · 2024-12-03T19:05:05Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-03T20:29:36Z

Merge failed

Reason: 1 jobs have failed, first few of them are: torchbench / cuda12.1-py3.10-gcc9-sm80 / test (torchbench_gcp_smoketest, 1, 1, awsa100.linux.gcp.a100)

Details for Dev Infra team

Raised by workflow job

jamesjwu · 2024-12-03T20:55:29Z

@pytorchbot rebase

pytorchmergebot · 2024-12-03T20:56:59Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2024-12-03T20:57:11Z

Successfully rebased gh/ezyang/3034/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/141877)

[ghstack-poisoned]

jamesjwu · 2024-12-04T14:13:33Z

Don't think the last error is related, so will try this one more time.

@pytorchbot merge -i

pytorchmergebot · 2024-12-04T14:15:24Z

Merge started

Your change will be merged while ignoring the following 2 checks: torchbench / cuda12.1-py3.10-gcc9-sm80 / test (torchbench_gcp_smoketest, 1, 1, awsa100.linux.gcp.a100), inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rch#141877) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel

…se class (pytorch#141878) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#141878 Approved by: https://github.com/jamesjwu ghstack dependencies: pytorch#141877

…te (pytorch#141877)" This reverts commit 6153439. Reverted pytorch#141877 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but a lot of failures shows up after this lands ([comment](pytorch#141877 (comment)))

…rch#141877) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel Co-authored-by: James Wu <[email protected]>

…te (pytorch#141877)" This reverts commit 3ab4a28. Reverted pytorch#141877 on behalf of https://github.com/huydhn due to Job are failing en masse after this lands, so it looks like a land race ([comment](pytorch#141877 (comment)))

…rch#141877) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel Co-authored-by: James Wu <[email protected]>

I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 91491ea Pull Request resolved: pytorch/pytorch#141877

I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: a59f210 Pull Request resolved: pytorch/pytorch#141877

I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: de15fd6 Pull Request resolved: pytorch/pytorch#141877

Update

096b2fb

[ghstack-poisoned]

ezyang requested a review from bdhirsh as a code owner December 2, 2024 15:33

pytorch-bot bot added ciflow/inductor module: inductor labels Dec 2, 2024

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim and miladm December 2, 2024 15:34

ezyang requested review from aorenste, eellison, jamesjwu, jansel and oulgen December 2, 2024 15:34

ezyang mentioned this pull request Dec 2, 2024

[REFACTOR] Factor _fx_graph_cache_key and _time_taken_ns to common base class #141878

Closed

jamesjwu approved these changes Dec 2, 2024

View reviewed changes

jansel approved these changes Dec 2, 2024

View reviewed changes

ezyang added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Dec 2, 2024

pytorchmergebot added the merging label Dec 2, 2024

albanD removed their request for review December 2, 2024 18:59

jamesjwu mentioned this pull request Dec 2, 2024

Refactor optional graph module into CompiledFxGraphConstants #141897

Closed

pytorchmergebot added the Merged label Dec 2, 2024

pytorchmergebot closed this in 6153439 Dec 2, 2024

pytorchmergebot removed the merging label Dec 2, 2024

pytorchmergebot added the merging label Dec 3, 2024

pytorchmergebot removed the merging label Dec 3, 2024

Update

1a5af82

[ghstack-poisoned]

Update

5cd6d04

[ghstack-poisoned]

eellison removed their request for review December 3, 2024 23:06

Update

6fb06f0

[ghstack-poisoned]

pytorchmergebot added the merging label Dec 4, 2024

pytorchmergebot closed this in 7666c82 Dec 4, 2024

pytorchmergebot removed the merging label Dec 4, 2024

github-actions bot deleted the gh/ezyang/3034/head branch January 4, 2025 02:04

		FxGraphCache.post_compile(self, example_inputs, cudagraphs, gm)
		This runs whether or not we have a cache hit, and always runs directly after we get a CompiledFxGraph.

[REFACTOR] Inline FxGraphCache.post_compile into sole call site #141877

[REFACTOR] Inline FxGraphCache.post_compile into sole call site #141877

Uh oh!

Conversation

ezyang commented Dec 2, 2024 • edited by jamesjwu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141877

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

jamesjwu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

jamesjwu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

jamesjwu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

jamesjwu Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang commented Dec 2, 2024

Uh oh!

pytorchmergebot commented Dec 2, 2024

Merge started

Uh oh!

jamesjwu commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Dec 3, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 3, 2024

Merge failed

Uh oh!

jamesjwu commented Dec 3, 2024

Uh oh!

pytorchmergebot commented Dec 3, 2024

Uh oh!

pytorchmergebot commented Dec 3, 2024

Uh oh!

jamesjwu commented Dec 4, 2024

Uh oh!

pytorchmergebot commented Dec 4, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ezyang commented Dec 2, 2024 •

edited by jamesjwu

Loading

pytorch-bot bot commented Dec 2, 2024 •

edited

Loading

jamesjwu commented Dec 3, 2024 •

edited

Loading