fix dynamo tracking numpy 2 ops #138686

haifeng-jin · 2024-10-23T06:14:51Z

Summary

Support torch.compile() to trace through numpy.random ops when used with NumPy 2.

Details:

Fixes #136559
As we upgrade to NumPy 2, torch falsely filtered out numpy.random as unsupported in dynamo tracking.
This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged.

Before this PR, the following tests failed:

PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random
PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor
PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy
PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors

With this PR, the supported/unsupported ops in NumPy 1 are not changed.
For NumPy 2, only the numpy.random ops that are already supported with NumPy 1 are added to the supported list.

I used the following scripts to check the differences before and after the change for both NumPy 1 & 2.
The output is empty for NumPy 1 since there is no change.
The output is a list of numpy.random that considered supported for NumPy 2.

from torch._dynamo import trace_rules
import numpy as np


def new_numpy_function_ids():
    unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"}

    def is_supported(k, v, mod):
        if not callable(v):
            return False
        if not getattr(v, "__module__", None):
            return True
        if v.__module__ == mod.__name__:
            return True
        if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs:
            return True
        return False
    rv = {}
    for mod in trace_rules.NP_SUPPORTED_MODULES:
        for k, v in mod.__dict__.items():
            if is_supported(k, v, mod):
                rv[id(v)] = f"{mod.__name__}.{k}"
    return rv

def old_numpy_function_ids():
    rv = {}
    for mod in trace_rules.NP_SUPPORTED_MODULES:
        rv.update(
            {
                id(v): f"{mod.__name__}.{k}"
                for k, v in mod.__dict__.items()
                if callable(v)
                and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__
            }
        )
    return rv

rv1 = set(old_numpy_function_ids().values())
rv2 = set(new_numpy_function_ids().values())

for v in (rv1 - rv2):
    print(v)
print("****")
for v in (rv2 - rv1):
    print(v)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec @kiukchung @lezcano

pytorch-bot · 2024-10-23T06:14:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138686

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit abe1b9f with merge base 73fde0d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

haifeng-jin · 2024-10-28T18:05:02Z

@pytorchbot label "topic: not user facing"

zou3519 · 2024-10-29T20:54:21Z

I'm not the right person to review this, @anijain2305 @williamwen42 do one of you want to take this?

haifeng-jin · 2024-10-29T21:25:43Z

Can we also request a review from @lezcano?
Thanks!

williamwen42 · 2024-10-29T22:54:20Z

torch/_dynamo/trace_rules.py

You might not need this set - if we actually encounter an unsupported numpy function, I believe we would just graph break when attempting to trace the call in dynamo. You can try removing this set and seeing if the tests pass.

I have update the PR.
I did some quick local tests. They passed.
Thanks!

After removing the list and allow them to graph break, it somehow created more graph breaks for the Hugging face BigBird model and failed the inductor tests.

So I added back the list and excluded those ops from being tracked. The tests then passed locally.

williamwen42 · 2024-10-31T22:12:48Z

@pytorchbot rebase

pytorchmergebot · 2024-10-31T22:14:22Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-10-31T22:14:26Z

Successfully rebased dynamo onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dynamo && git pull --rebase)

williamwen42 · 2024-11-01T17:26:57Z

@pytorchbot merge

pytorchmergebot · 2024-11-01T17:29:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

huydhn · 2024-11-01T23:38:46Z

Maybe the the expected value just needs to be updated https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo/ci_expected_accuracy

haifeng-jin · 2024-11-03T00:29:38Z

@huydhn Thank you!
Would you mind share how should I reproduce the errors that caused by the PR?
like the command for running the failed tests.

I will debug my changes and make sure they pass the tests before it is merged.

huydhn · 2024-11-03T02:19:50Z

Here is an example failed job GH job link HUD commit link

I have added ciflow/inductor in your PR, so you can see they are failing on the PR too #138686 (comment)

Fixes pytorch#136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: pytorch#138686 Approved by: https://github.com/lezcano, https://github.com/williamwen42

This reverts commit 124eac2. Reverted pytorch#138686 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I am seeing inductor failure with hf_BigBird number of graph breaks after it lands ([comment](pytorch#138686 (comment)))

This reverts commit 3b87731350752f2c3c16543fa57ff51fe22a54b5.

haifeng-jin · 2024-11-07T17:36:26Z

I have locally reproduced the Hugging face BigBird graph breaks and fixed it.
See more about the cause and fix in this thread.

More details on reproducing it locally

I cloned the torch benchmark repo and followed the installation instructions.
Added it to the Python path and run the following command in the PyTorch root repo.

python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cpu --inference --float32 --total-partitions 2 --partition-id 0 --output inference_torchbench.csv -k hf_BigBird

haifeng-jin · 2024-11-08T05:51:31Z

@williamwen42 @huydhn Would you please help review and merge the PR? Thanks!

williamwen42 · 2024-11-08T18:35:39Z

@pytorchbot merge

pytorchmergebot · 2024-11-08T18:37:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

haifeng-jin · 2024-11-14T23:43:23Z

@pytorchbot label "topic: bug fixes"

haifeng-jin · 2024-11-15T16:19:16Z

@pytorchbot label "release notes: dynamo"

Fixes pytorch#136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: pytorch#138686 Approved by: https://github.com/williamwen42

pytorch-bot bot added the module: dynamo label Oct 23, 2024

pytorchbot added the open source label Oct 23, 2024

colesbury requested a review from zou3519 October 23, 2024 20:09

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2024

pytorch-bot bot added the topic: not user facing topic category label Oct 28, 2024

zou3519 removed their request for review October 29, 2024 20:54

williamwen42 requested a review from lezcano October 29, 2024 22:53

williamwen42 reviewed Oct 29, 2024

View reviewed changes

lezcano previously approved these changes Oct 30, 2024

View reviewed changes

haifeng-jin force-pushed the dynamo branch from 3b87731 to f85b3eb Compare October 30, 2024 18:15

haifeng-jin requested a review from williamwen42 October 30, 2024 18:15

williamwen42 previously approved these changes Oct 30, 2024

View reviewed changes

haifeng-jin added 2 commits October 31, 2024 22:14

fix dynamo tracking numpy 2 ops

9baf717

remove the unnecessary list of unsupported numpy ops

f9d007e

pytorchmergebot force-pushed the dynamo branch from f85b3eb to f9d007e Compare October 31, 2024 22:14

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 1, 2024

pytorchmergebot added the merging label Nov 1, 2024

pytorchmergebot added the Merged label Nov 1, 2024

pytorchmergebot closed this in 124eac2 Nov 1, 2024

pytorchmergebot removed the merging label Nov 1, 2024

huydhn added the ciflow/inductor label Nov 1, 2024

Exclude the unsupported ops.

abe1b9f

This reverts commit 3b87731350752f2c3c16543fa57ff51fe22a54b5.

haifeng-jin requested a review from williamwen42 November 7, 2024 17:37

williamwen42 approved these changes Nov 8, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 8, 2024

pytorchmergebot closed this in 2af5172 Nov 8, 2024

pytorchmergebot removed the merging label Nov 8, 2024

pytorch-bot bot added the topic: bug fixes topic category label Nov 14, 2024

atalman removed the topic: not user facing topic category label Nov 15, 2024

pytorch-bot bot added the release notes: dynamo label Nov 15, 2024

rgommers mentioned this pull request Nov 15, 2024

torch.compile errors when tracing numpy.random.uniform with numpy2 #136559

Closed

fix dynamo tracking numpy 2 ops #138686

fix dynamo tracking numpy 2 ops #138686

Uh oh!

Conversation

haifeng-jin commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details:

Uh oh!

pytorch-bot bot commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138686

❗ 1 Active SEVs

✅ No Failures

Uh oh!

haifeng-jin commented Oct 28, 2024

Uh oh!

zou3519 commented Oct 29, 2024

Uh oh!

haifeng-jin commented Oct 29, 2024

Uh oh!

williamwen42 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

haifeng-jin Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

haifeng-jin Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

williamwen42 commented Oct 31, 2024

Uh oh!

pytorchmergebot commented Oct 31, 2024

Uh oh!

pytorchmergebot commented Oct 31, 2024

Uh oh!

williamwen42 commented Nov 1, 2024

Uh oh!

pytorchmergebot commented Nov 1, 2024

Merge started

Uh oh!

huydhn commented Nov 1, 2024

Uh oh!

haifeng-jin commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Nov 3, 2024

Uh oh!

haifeng-jin commented Nov 7, 2024

Uh oh!

haifeng-jin commented Nov 8, 2024

Uh oh!

williamwen42 commented Nov 8, 2024

Uh oh!

pytorchmergebot commented Nov 8, 2024

Merge started

Uh oh!

haifeng-jin commented Nov 14, 2024

Uh oh!

haifeng-jin commented Nov 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

haifeng-jin commented Oct 23, 2024 •

edited

Loading

pytorch-bot bot commented Oct 23, 2024 •

edited

Loading

haifeng-jin Nov 7, 2024 •

edited

Loading

haifeng-jin commented Nov 3, 2024 •

edited

Loading