[logging][ez] Add timer logging for pickling and unpickle for object based collective #139757

fduwjj · 2024-11-05T14:25:52Z

Summary: As discussed, we want to measure the time spent during pickling and unpickle.

Test Plan: CI

Reviewed By: wz337

Differential Revision: D65462767

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @wz337 @wconstab @d4l3k @c-p-i-o

…based collective Summary: As discussed, we want to measure the time spent during pickling and unpickle. Test Plan: CI Reviewed By: wz337 Differential Revision: D65462767

pytorch-bot · 2024-11-05T14:25:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139757

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 64018b0 with merge base 546318e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-05T14:26:01Z

This pull request was exported from Phabricator. Differential Revision: D65462767

Skylion007 · 2024-11-05T14:40:17Z

torch/distributed/distributed_c10d.py

+@_time_logger
 def _object_to_tensor(obj, device, group):
    f = io.BytesIO()
    _pickler(f).dump(obj)


If you care about pickling time here, serialize it with protocol=pickle.HIGHEST_PROTOCOL, as that will prevent an extra copy in a lot of cases and use the newer PICKLE algorithms which are fastest. The downside is that it won't be deserializable by older versions of Python, but all the nodes here should be on the same Python version.

Actually, the latest version of PICKLE was introduced in 3.8, so for any supported version of Python we should be good here and you could set the protocl version explicitly to that if you are really worried. See https://peps.python.org/pep-0574/ and https://docs.python.org/3/library/pickle.html for more info

Suggested change

_pickler(f).dump(obj)

_pickler(f, protocol=pickle.HIGHEST_PROTOCOL).dump(obj)

oh really, thanks for your suggestion. Let me look into this.

@fduwjj Added the arg as a suggestion

@fduwjj Just checked and default PICKLE version is now 4 as of 3.8. This should update it to 5 which is a mild, but potentially free speed up for objects which support it (like numpy arrays etc).

Also pretty sure torch.multiprocessing already does this protocol.HIGHEST with our monkeypatch wrapper :)

pytorch/torch/multiprocessing/queue.py

Line 16 in 546318e

ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj)

I see. Somehow I tried this protocol in our other applications, I didn't see any perf improvements.

@fduwjj It will only really speed up things if the Python objects implement the new pickle protocol APIs to take advantage of it. It's really what should be done though in case where the pickle is for ephemeral process to process sharing like here.

fegin

LGTM

facebook-github-bot · 2024-11-05T17:16:05Z

@pytorchbot merge -i