Skip to content

We probably are allowing mutations to happen on fake tensor in VariableTracker #93610

@ezyang

Description

@ezyang

🐛 Describe the bug

VariableTracker is supposed to be immutable, so we can do checkpointing and rollback when instruction translation fails and we need to create a graph break.

However, we (indirectly) store fake tensor in VariableTracker, and fake tensor is not treated immutably. In particular, when we run nodes on fake tensor, those nodes may modify the metadata of the input fake tensor, and I don't see any logic that appropriately duplicates fake tensors so that we can safely run modifications on them.

The hypothetical bug is that we checkpoint, we then run the symbolic evaluator on squeeze_, and that mutates the fake tensor, but then we discover we need to rollback, and now subsequent code will see incorrect metadata for the input to squeeze. Discussing this with jansel, this is unlikely to happen in practice today, because the only time we rollback is right before we are generating a graph break, and at that point we're not going to actually insert any more nodes into the graph. But it will be good to get this correct, in case we end up using checkpointing more seriously in the future. More generally, we probably need some sort of lint to make sure that VariableTrackers do not actually get mutated, because they're not going to turn into real bugs immediately if you corrupt the state.

cc @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire @eellison @jansel

Error logs

No response

Minified repro

No response

Metadata

Metadata

Assignees

Labels

module: dynamotriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions