Memoize repeated nonzero calls to the same fake tensor #95399

ezyang · 2023-02-23T19:19:43Z

Stack from ghstack (oldest at bottom):

This removes the need to explicitly constrain_unify x[mask] and y[mask] when mask is a boolean tensor.

Imagine you want to run x[mask] + y[mask]. Internally, this calls into two calls to nonzero, for each boolean masking. Without memoizing, you would get distinct sizevars for each and the addition would fail.

It's very narrow but it seems to work in practice.

To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now.

Signed-off-by: Edward Z. Yang [email protected]

Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

pytorch-bot · 2023-02-23T19:19:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95399

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 8ad2437:

NEW FAILURES - The following jobs have failed:

cuda11.7-py3.10-gcc7-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This removes the need to explicitly constrain_unify `x[mask]` and `y[mask]` when mask is a boolean tensor. It's very narrow but it seems to work in practice. To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 458e324 Pull Request resolved: #95399

eellison

Nice ! Reviewing on code itself and assuming the constrain(2, inf) thing has been discussed/approved elsewhere

eellison · 2023-02-23T22:31:09Z

torch/_subclasses/fake_tensor.py

+            lower = 0
+            upper = guard_int(arg.numel())


If we're guarding on the numel, we might as well make lower = upper.

guarding on numel of the INPUT. That won't tell you the lower bound

eellison · 2023-02-23T22:31:25Z

torch/_subclasses/fake_tensor.py

+
+        # This is unsound, but it works well in practice
+        # See https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#
+        # TODO: Add a config knob to turn off this unsound behavior


Config is easy enough to add in this PR imo

eellison · 2023-02-23T22:32:48Z

torch/_subclasses/fake_tensor.py

+    # x[mask] and y[mask]; mask.nonzero() gets repeatedly called and should
+    # give a consistent unbacked SymInt.  It needs to be invalidated in the
+    # same way constant is.
+    # TODO: Generalize this as needed, e.g., into a trie of memos


What would a trie of memos here look like ? For other operators besides nonzero ?

Also if you repeatedly do a view on the tensor. You could memo a.view(...).nonzero()

eellison · 2023-02-23T22:35:06Z

torch/_subclasses/fake_tensor.py

+        if self._nonzero_memo_vc != self._version:
+            self._nonzero_memo = None
+            return None
+        return self._nonzero_memo


Would be nice to not add another invalidation mechanism. version_counter does not work in inference_mode, or with resize_.

Unrelated: would be great to make version_counter always reliable, since it's very useful (as here), even if that meant inference mode + pt2 incompatibility

I agree with you in principle, but I can't really bring myself to care right now. I can file an issue for the backlog for if this actually becomes a problem.

ezyang · 2023-02-24T00:25:51Z

@pytorchbot merge -f "looks like ci outage upstream"

pytorchmergebot · 2023-02-24T00:27:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This removes the need to explicitly constrain_unify `x[mask]` and `y[mask]` when mask is a boolean tensor. It's very narrow but it seems to work in practice. To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch/pytorch#95399 Approved by: https://github.com/eellison

…ration" This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

…ration" This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. ghstack-source-id: 8fa78ad Pull Request resolved: #99439

…ration" This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. ghstack-source-id: 6a155c2 Pull Request resolved: #99439

…ration" This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. ghstack-source-id: ded45f8 Pull Request resolved: #99439

This PR: - adds a FakeTensor registration API for CustomOp (CustomOp.impl_fake) - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The fake implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their FakeTensor implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.new_data_dependent_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds an abstract registration API for CustomOp (CustomOp.impl_abstract) that is used for both FakeTensor and meta tensors - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The abstract implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their abstract implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.create_unbacked_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. ghstack-source-id: a087856 Pull Request resolved: #99439

…gistration" This PR: - adds an abstract registration API for CustomOp (CustomOp.impl_abstract) that is used for both FakeTensor and meta tensors - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The abstract implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their abstract implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.create_unbacked_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]

This PR: - adds an abstract registration API for CustomOp (CustomOp.impl_abstract) that is used for both FakeTensor and meta tensors - deletes CustomOp.impl_meta The user story behind this API is that it is the one-stop shop for registering implementations for data-less Tensors, i.e. FakeTensor and Meta tensor. The abstract implementation provided by the user: - gets registered as the FakeTensor implementation AND the meta formula - can be written like a regular meta formula. If the user decides that they need something more special (i.e. data-dependent output shape), then they are able to query a current context object (FakeTensorImplCtx) that has methods to construct new unbacked symints. Caveats: - we really need to make FakeTensor/FakeTensorMode public. Otherwise, there isn't a way for the user to interactively test that their abstract implementation is correct without running through large pieces of the PT2 stack (make_fx or torch.compile). - We do not memoize the symints produced by ctx.create_unbacked_symint(). It is possible to do this in the future, but it is difficult to do soundly and I am not convinced of the utility outside of the nonzero() usecase mentioned in #95399 Public API: - More docs will come when we actually expose this API to users by putting it in a public namespace, unless you folks want it now. - The APIs mentioned in `__all__` are the ones that are intended to be public. Test Plan: - Updated existing custom_op_db operators - Added new numpy_nonzero and numpy_nms operations that test operations that have data-dependendent output shape. [ghstack-poisoned]