[GraphPartition] cache get_free_symbol_uses (#166338) #166994

Lucaskabela · 2025-11-04T19:45:09Z

Graph partition relies on get_free_symbol_uses() to collect symbol inputs.

Lines 4869 to 4885 in ee7434b

    
                   def get_scheduler_node_symbol_uses( 
        
                       node: BaseSchedulerNode, 
        
                   ) -> OrderedSet[sympy.Symbol]: 
        
                       """ 
        
                       Gets symbols used in node. 
        
                       """ 
        
                       if isinstance(node, FusedSchedulerNode): 
        
                           return OrderedSet().union( 
        
                               *(get_scheduler_node_symbol_uses(snode) for snode in node.snodes) 
        
                           ) 
        
                       assert node.node is not None 
        
                       free_symbol_uses = node.node.get_free_symbol_uses() 
        
                       free_symbol_uses.update( 
        
                           *(get_layout_symints(ir_node) for ir_node in node.node.get_outputs()) 
        
                       ) 
        
                       return free_symbol_uses

I empirically observed that get_free_symbol_uses() becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to get_free_symbol_uses() for 1 node.

Why? Because get_free_symbol_uses() may recursively call another get_free_symbol_uses(), which could recursively run many times.

pytorch/torch/_inductor/ir.py

Lines 4541 to 4543 in ee7434b

    
           result = self.layout.get_free_symbol_uses( 
        
               unbacked_only 
        
           ) | self.data.get_free_symbol_uses(unbacked_only)

This PR fixes the issue by caching the results of get_free_symbol_uses(). I validated on torchtitan that the issue is fixed.

Pull Request resolved: #166338

(cherry picked from commit dfebdca)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-11-04T19:45:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166994

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 97cb547 with merge base 4840a1a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Lucaskabela · 2025-11-04T19:47:35Z

cc @BoyuanFeng to review cherrypick conflict resolution

BoyuanFeng · 2025-11-04T23:22:00Z

The test fails on release/2.9 but not on main branch. The unit test is added by the same PR to check the FlexibleLayout does not change free symbol uses. However, the unit test needs #163639 to run. I have validated that the test passed when patching #163639.

Since we don't want to cherry-pick #163639, this PR will cherry pick #166338 without the test test_flexible_layout_immutable_free_symbols_dynamic_shapes.

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs. https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885 I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node. Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times. https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543 This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed. Pull Request resolved: #166338 Approved by: https://github.com/eellison (cherry picked from commit dfebdca)

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 4, 2025

Lucaskabela mentioned this pull request Nov 4, 2025

[v2.9.1] Release Tracker #166758

Closed

Lucaskabela requested a review from BoyuanFeng November 4, 2025 19:47

BoyuanFeng approved these changes Nov 4, 2025

View reviewed changes

Lucaskabela force-pushed the cherrypick_166338 branch from ffb2f3b to bed779a Compare November 4, 2025 23:33

Lucaskabela force-pushed the cherrypick_166338 branch from bed779a to 97cb547 Compare November 4, 2025 23:34

atalman approved these changes Nov 6, 2025

View reviewed changes

atalman merged commit 3d27d95 into release/2.9 Nov 6, 2025
119 checks passed

github-actions bot deleted the cherrypick_166338 branch December 7, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GraphPartition] cache get_free_symbol_uses (#166338) #166994

[GraphPartition] cache get_free_symbol_uses (#166338) #166994

Uh oh!

Lucaskabela commented Nov 4, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

Lucaskabela commented Nov 4, 2025

Uh oh!

BoyuanFeng commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	def get_scheduler_node_symbol_uses(
	node: BaseSchedulerNode,
	) -> OrderedSet[sympy.Symbol]:
	"""
	Gets symbols used in node.
	"""
	if isinstance(node, FusedSchedulerNode):
	return OrderedSet().union(
	*(get_scheduler_node_symbol_uses(snode) for snode in node.snodes)
	)
	assert node.node is not None
	free_symbol_uses = node.node.get_free_symbol_uses()
	free_symbol_uses.update(
	*(get_layout_symints(ir_node) for ir_node in node.node.get_outputs())
	)
	return free_symbol_uses

	result = self.layout.get_free_symbol_uses(
	unbacked_only
	) \| self.data.get_free_symbol_uses(unbacked_only)

[GraphPartition] cache get_free_symbol_uses (#166338) #166994

[GraphPartition] cache get_free_symbol_uses (#166338) #166994

Uh oh!

Conversation

Lucaskabela commented Nov 4, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166994

✅ No Failures

Uh oh!

Lucaskabela commented Nov 4, 2025

Uh oh!

BoyuanFeng commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Lucaskabela commented Nov 4, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading