-
Notifications
You must be signed in to change notification settings - Fork 26.3k
fix dynamo tracking numpy 2 ops #138686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix dynamo tracking numpy 2 ops #138686
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138686
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit abe1b9f with merge base 73fde0d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
I'm not the right person to review this, @anijain2305 @williamwen42 do one of you want to take this? |
|
Can we also request a review from @lezcano? |
torch/_dynamo/trace_rules.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not need this set - if we actually encounter an unsupported numpy function, I believe we would just graph break when attempting to trace the call in dynamo. You can try removing this set and seeing if the tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have update the PR.
I did some quick local tests. They passed.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing the list and allow them to graph break, it somehow created more graph breaks for the Hugging face BigBird model and failed the inductor tests.
So I added back the list and excluded those ops from being tracked. The tests then passed locally.
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
f85b3eb to
f9d007e
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
|
Maybe the the expected value just needs to be updated https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo/ci_expected_accuracy |
|
@huydhn Thank you! I will debug my changes and make sure they pass the tests before it is merged. |
|
Here is an example failed job GH job link HUD commit link I have added |
Fixes pytorch#136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: pytorch#138686 Approved by: https://github.com/lezcano, https://github.com/williamwen42
This reverts commit 124eac2. Reverted pytorch#138686 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I am seeing inductor failure with hf_BigBird number of graph breaks after it lands ([comment](pytorch#138686 (comment)))
This reverts commit 3b87731350752f2c3c16543fa57ff51fe22a54b5.
|
I have locally reproduced the Hugging face BigBird graph breaks and fixed it. More details on reproducing it locallyI cloned the torch benchmark repo and followed the installation instructions. python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cpu --inference --float32 --total-partitions 2 --partition-id 0 --output inference_torchbench.csv -k hf_BigBird |
|
@williamwen42 @huydhn Would you please help review and merge the PR? Thanks! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot label "topic: bug fixes" |
|
@pytorchbot label "release notes: dynamo" |
Fixes pytorch#136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: pytorch#138686 Approved by: https://github.com/williamwen42
Summary
Support
torch.compile()to trace throughnumpy.randomops when used with NumPy 2.Details:
Fixes #136559
As we upgrade to NumPy 2, torch falsely filtered out
numpy.randomas unsupported in dynamo tracking.This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged.
Before this PR, the following tests failed:
With this PR, the supported/unsupported ops in NumPy 1 are not changed.
For NumPy 2, only the
numpy.randomops that are already supported with NumPy 1 are added to the supported list.I used the following scripts to check the differences before and after the change for both NumPy 1 & 2.
The output is empty for NumPy 1 since there is no change.
The output is a list of
numpy.randomthat considered supported for NumPy 2.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec @kiukchung @lezcano