Skip to content

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Nov 5, 2024

Summary: As discussed, we want to measure the time spent during pickling and unpickle.

Test Plan: CI

Reviewed By: wz337

Differential Revision: D65462767

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @wz337 @wconstab @d4l3k @c-p-i-o

…based collective

Summary: As discussed, we want to measure the time spent during pickling and unpickle.

Test Plan: CI

Reviewed By: wz337

Differential Revision: D65462767
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139757

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 64018b0 with merge base 546318e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Nov 5, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65462767

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2024
@_time_logger
def _object_to_tensor(obj, device, group):
f = io.BytesIO()
_pickler(f).dump(obj)
Copy link
Collaborator

@Skylion007 Skylion007 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you care about pickling time here, serialize it with protocol=pickle.HIGHEST_PROTOCOL, as that will prevent an extra copy in a lot of cases and use the newer PICKLE algorithms which are fastest. The downside is that it won't be deserializable by older versions of Python, but all the nodes here should be on the same Python version.

Actually, the latest version of PICKLE was introduced in 3.8, so for any supported version of Python we should be good here and you could set the protocl version explicitly to that if you are really worried. See https://peps.python.org/pep-0574/ and https://docs.python.org/3/library/pickle.html for more info

Suggested change
_pickler(f).dump(obj)
_pickler(f, protocol=pickle.HIGHEST_PROTOCOL).dump(obj)

Copy link
Contributor Author

@fduwjj fduwjj Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh really, thanks for your suggestion. Let me look into this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fduwjj Added the arg as a suggestion

Copy link
Collaborator

@Skylion007 Skylion007 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fduwjj Just checked and default PICKLE version is now 4 as of 3.8. This should update it to 5 which is a mild, but potentially free speed up for objects which support it (like numpy arrays etc).

Copy link
Collaborator

@Skylion007 Skylion007 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pretty sure torch.multiprocessing already does this protocol.HIGHEST with our monkeypatch wrapper :)

ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Somehow I tried this protocol in our other applications, I didn't see any perf improvements.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fduwjj It will only really speed up things if the Python objects implement the new pickle protocol APIs to take advantage of it. It's really what should be done though in case where the pickle is for ephemeral process to process sharing like here.

Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

16 similar comments
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

pytorchmergebot pushed a commit that referenced this pull request Nov 8, 2024
Summary:
This is trying to fix a regression caused by #139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Reviewed By: fduwjj

Differential Revision: D65490202

Pull Request resolved: #139804
Approved by: https://github.com/XilunWu
atalman pushed a commit to atalman/pytorch that referenced this pull request Nov 11, 2024
…h#139804)

Summary:
This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Reviewed By: fduwjj

Differential Revision: D65490202

Pull Request resolved: pytorch#139804
Approved by: https://github.com/XilunWu
@izaitsevfb
Copy link
Contributor

@pytorchbot revert -m "reverted internally, see D65682470" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Nov 12, 2024
@pytorchmergebot
Copy link
Collaborator

@fduwjj your PR has been successfully reverted.
⚠️ This PR might contain internal changes
cc: @pytorch/pytorch-dev-infra

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 12, 2024
pytorchmergebot pushed a commit that referenced this pull request Nov 13, 2024
…nd tensor to object (#140414)

Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight.

Pull Request resolved: #140414
Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
…h#139804)

Summary:
This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Reviewed By: fduwjj

Differential Revision: D65490202

Pull Request resolved: pytorch#139804
Approved by: https://github.com/XilunWu
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
…nd tensor to object (pytorch#140414)

Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (pytorch#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight.

Pull Request resolved: pytorch#140414
Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…h#139804)

Summary:
This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Reviewed By: fduwjj

Differential Revision: D65490202

Pull Request resolved: pytorch#139804
Approved by: https://github.com/XilunWu
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…nd tensor to object (pytorch#140414)

Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (pytorch#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight.

Pull Request resolved: pytorch#140414
Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
@fduwjj fduwjj closed this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants