-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[logging][ez] Add timer logging for pickling and unpickle for object based collective #139757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…based collective Summary: As discussed, we want to measure the time spent during pickling and unpickle. Test Plan: CI Reviewed By: wz337 Differential Revision: D65462767
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139757
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 64018b0 with merge base 546318e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D65462767 |
| @_time_logger | ||
| def _object_to_tensor(obj, device, group): | ||
| f = io.BytesIO() | ||
| _pickler(f).dump(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you care about pickling time here, serialize it with protocol=pickle.HIGHEST_PROTOCOL, as that will prevent an extra copy in a lot of cases and use the newer PICKLE algorithms which are fastest. The downside is that it won't be deserializable by older versions of Python, but all the nodes here should be on the same Python version.
Actually, the latest version of PICKLE was introduced in 3.8, so for any supported version of Python we should be good here and you could set the protocl version explicitly to that if you are really worried. See https://peps.python.org/pep-0574/ and https://docs.python.org/3/library/pickle.html for more info
| _pickler(f).dump(obj) | |
| _pickler(f, protocol=pickle.HIGHEST_PROTOCOL).dump(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh really, thanks for your suggestion. Let me look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fduwjj Added the arg as a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fduwjj Just checked and default PICKLE version is now 4 as of 3.8. This should update it to 5 which is a mild, but potentially free speed up for objects which support it (like numpy arrays etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also pretty sure torch.multiprocessing already does this protocol.HIGHEST with our monkeypatch wrapper :)
pytorch/torch/multiprocessing/queue.py
Line 16 in 546318e
| ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Somehow I tried this protocol in our other applications, I didn't see any perf improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fduwjj It will only really speed up things if the Python objects implement the new pickle protocol APIs to take advantage of it. It's really what should be done though in case where the pickle is for ephemeral process to process sharing like here.
fegin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
16 similar comments
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
Summary: This is trying to fix a regression caused by #139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported. Reviewed By: fduwjj Differential Revision: D65490202 Pull Request resolved: #139804 Approved by: https://github.com/XilunWu
…based collective (pytorch#139757) Summary: As discussed, we want to measure the time spent during pickling and unpickle. Test Plan: CI Reviewed By: wz337 Differential Revision: D65462767 Pull Request resolved: pytorch#139757 Approved by: https://github.com/awgu, https://github.com/Skylion007, https://github.com/fegin, https://github.com/c-p-i-o
…h#139804) Summary: This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported. Reviewed By: fduwjj Differential Revision: D65490202 Pull Request resolved: pytorch#139804 Approved by: https://github.com/XilunWu
|
@pytorchbot revert -m "reverted internally, see D65682470" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
… object based collective (#139757)" This reverts commit 41e4d88. Reverted #139757 on behalf of https://github.com/izaitsevfb due to reverted internally, see D65682470 ([comment](#139757 (comment)))
|
@fduwjj your PR has been successfully reverted. |
…nd tensor to object (#140414) Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight. Pull Request resolved: #140414 Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
…based collective (pytorch#139757) Summary: As discussed, we want to measure the time spent during pickling and unpickle. Test Plan: CI Reviewed By: wz337 Differential Revision: D65462767 Pull Request resolved: pytorch#139757 Approved by: https://github.com/awgu, https://github.com/Skylion007, https://github.com/fegin, https://github.com/c-p-i-o
…h#139804) Summary: This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported. Reviewed By: fduwjj Differential Revision: D65490202 Pull Request resolved: pytorch#139804 Approved by: https://github.com/XilunWu
… object based collective (pytorch#139757)" This reverts commit 41e4d88. Reverted pytorch#139757 on behalf of https://github.com/izaitsevfb due to reverted internally, see D65682470 ([comment](pytorch#139757 (comment)))
…nd tensor to object (pytorch#140414) Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (pytorch#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight. Pull Request resolved: pytorch#140414 Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
…based collective (pytorch#139757) Summary: As discussed, we want to measure the time spent during pickling and unpickle. Test Plan: CI Reviewed By: wz337 Differential Revision: D65462767 Pull Request resolved: pytorch#139757 Approved by: https://github.com/awgu, https://github.com/Skylion007, https://github.com/fegin, https://github.com/c-p-i-o
…h#139804) Summary: This is trying to fix a regression caused by pytorch#139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported. Reviewed By: fduwjj Differential Revision: D65490202 Pull Request resolved: pytorch#139804 Approved by: https://github.com/XilunWu
… object based collective (pytorch#139757)" This reverts commit 41e4d88. Reverted pytorch#139757 on behalf of https://github.com/izaitsevfb due to reverted internally, see D65682470 ([comment](pytorch#139757 (comment)))
…nd tensor to object (pytorch#140414) Originally we want to leverage the timer logger to measure the time spent in object to tensor and tensor to object (pytorch#139757) But it gets reverted (internally) because of a performance regression. We now use wait counter instead which is more lightweight. Pull Request resolved: pytorch#140414 Approved by: https://github.com/c-p-i-o, https://github.com/XilunWu, https://github.com/wz337
Summary: As discussed, we want to measure the time spent during pickling and unpickle.
Test Plan: CI
Reviewed By: wz337
Differential Revision: D65462767
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @wz337 @wconstab @d4l3k @c-p-i-o