Skip to content

Conversation

@anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Nov 2, 2024

Stack from ghstack (oldest at bottom):

This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139560

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 2ff50ef with merge base 41e4d88 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

anijain2305 added a commit that referenced this pull request Nov 2, 2024
This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization.

ghstack-source-id: 8e654f5
Pull Request resolved: #139560
@anijain2305 anijain2305 requested review from ezyang and jansel November 2, 2024 20:10
@anijain2305 anijain2305 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 2, 2024
…g matches"


This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]
…g matches"


This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Nov 3, 2024
This is a bug on the main exposed by #139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization.

ghstack-source-id: 86d390f
Pull Request resolved: #139560
@anijain2305
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…ytorch#139560)

This is a bug on the main exposed by pytorch#139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.

Pull Request resolved: pytorch#139560
Approved by: https://github.com/jansel
@ZainRizvi
Copy link
Contributor

@pytorchbot revert -c ghfirst -m "Sorry but this seems to be breaking internal tests. Please see D65430317 for more details"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Nov 5, 2024
…atches (#139560)"

This reverts commit e6ff07f.

Reverted #139560 on behalf of https://github.com/ZainRizvi due to Sorry but this seems to be breaking internal tests. Please see D65430317 for more details ([comment](#139560 (comment)))
@pytorchmergebot
Copy link
Collaborator

@anijain2305 your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 5, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@anijain2305
Copy link
Contributor Author

@pytorchbot revert -m "internal test failures" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@anijain2305 your PR has been successfully reverted.

@aorenste
Copy link
Contributor

aorenste commented Nov 19, 2024

Edit - A clean rebuild seems to have fixed it.

  • This backout seems to have caused a bunch of tests to fail.

@anijain2305
Copy link
Contributor Author

relanding in #141085

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…atches (pytorch#139560)"

This reverts commit e6ff07f.

Reverted pytorch#139560 on behalf of https://github.com/ZainRizvi due to Sorry but this seems to be breaking internal tests. Please see D65430317 for more details ([comment](pytorch#139560 (comment)))
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…ytorch#139560)

This is a bug on the main exposed by pytorch#139476

We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.

If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.

So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.

Pull Request resolved: pytorch#139560
Approved by: https://github.com/jansel
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pytorchmergebot pushed a commit that referenced this pull request Dec 6, 2024
…atches (#141085)

Reland - #139560

As mentioned in #130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions.

Unfortunately, there is no easy way to trigger this segfault, so I can't write a test.

Pull Request resolved: #141085
Approved by: https://github.com/jansel

Co-authored-by: William Wen <[email protected]>
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
…atches (pytorch#141085)

Reland - pytorch#139560

As mentioned in pytorch#130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions.

Unfortunately, there is no easy way to trigger this segfault, so I can't write a test.

Pull Request resolved: pytorch#141085
Approved by: https://github.com/jansel

Co-authored-by: William Wen <[email protected]>
pytorchmergebot pushed a commit that referenced this pull request Dec 16, 2024
…atches (#141085)

Reland - #139560

As mentioned in #130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions.

Unfortunately, there is no easy way to trigger this segfault, so I can't write a test.

Pull Request resolved: #141085
Approved by: https://github.com/jansel

Co-authored-by: William Wen <[email protected]>
pytorchmergebot pushed a commit that referenced this pull request Dec 19, 2024
…atches (#141085)

Reland - #139560

As mentioned in #130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions.

Unfortunately, there is no easy way to trigger this segfault, so I can't write a test.

Pull Request resolved: #141085
Approved by: https://github.com/jansel

Co-authored-by: William Wen <[email protected]>
@github-actions github-actions bot deleted the gh/anijain2305/574/head branch December 20, 2024 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants